• Open

    Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence [N]
    https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/ It looks like content will have to be labeled, showing if it's AI-generated or not. And special rules will apply to: any model that was trained using a quantity of computing power greater than 1026 integer or floating-point operations, or using primarily biological sequence data and using a quantity of computing power greater than 1023 integer or floating-point operations; and any computing cluster that has a set of machines physically co-located in a single datacenter, transitively connected by data center networking of over 100 Gbit/s, and having a theoretical maximum computing capacity of 1020 integer or floating-point operations per second for training AI. Also, easier visas for "AI talent". submitted by /u/we_are_mammals [link] [comments]  ( 9 min )
    [D] Relevance Extraction in RAG Pipelines
    I came across this interesting problem in RAG, what I call Relevance Extraction. After retrieving relevant documents (or chunks), these chunks are often large and may contain several portions irrelevant to the query at hand. Stuffing the entire chunk into an LLM prompt impacts token-cost as well as response accuracy (distracting the LLM with irrelevant text), and and can also cause bumping into context-length limits. So a critical step in most pipelines is Relevance Extraction: use the LLM to extract verbatim only the portions relevant to the query. This is known by other names, e.g. LangChain calls it Contextual Compression, and the RECOMP paper calls it Extractive Compression https://twitter.com/manelferreira_/status/1713214439715938528 Thinking about how best to do this, I realized i…  ( 10 min )
    [D] Face Recognition
    I am working for a months on facial recognition system. I have 500 classes with 10 template for each class. I have applied almost and tested almost every model. Some of them are dlib 128 dim. Facenet (512), Insightface, Arcface. But none of them reduced the rate of false positive. I also fine tuned those models, I also trained a clasifier and I reached to 91 percent of accuracy with a very low quality images and also have a backlight effect. But the what I want my setup to be 99 percent which correctly classifies the person with a high similarity on true positive and a low similarity around 0.1 or 0.2 for an unknown class. Now my problem is that similarity is also high with unknown almost around 99 and I also used some distance metrics and it also not helping out. Note: the environment is a real world environment and I used CCTVs at a place where hundreds of unknown people visit daily. submitted by /u/No_Garbage9512 [link] [comments]  ( 9 min )
    [D] Computation graphs with in-place operations
    With a computation graph typically used to represent NNs, where nodes represent operations and edges represent data dependencies, is it possible to represent in-place operations? submitted by /u/thanrl [link] [comments]  ( 9 min )
    [D] How to evaluate if LLMs are following certain guidelines or not?
    I recently wrote a blog on evaluating whether your LLM applications are following required guidelines: https://uptrain.ai/blog/lost-in-translation-the-critical-impact-of-neglecting-guideline-adherence-in-llms?utm_source=reddit&utm_medium=reddit&utm_campaign=reddit Please let me know your feedback in the comment section submitted by /u/Vegetable-Skill-9700 [link] [comments]  ( 9 min )
    [R] RedPajama-Data-v2: an Open Dataset with 30 Trillion Tokens for Training Large Language Models
    Blog: https://together.ai/blog/redpajama-data-v2 Hugging Face: https://huggingface.co/datasets/togethercomputer/RedPajama-Data-V2 GitHub: https://github.com/togethercomputer/RedPajama-Data Description: RedPajama-V2 is an open dataset for training large language models. The dataset includes over 100B text documents coming from 84 CommonCrawl snapshots and processed using the CCNet pipeline. Out of these, there are 30B documents in the corpus that additionally come with quality signals, and 20B documents that are deduplicated. submitted by /u/APaperADay [link] [comments]  ( 9 min )
    [R] FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions
    Paper: https://arxiv.org/abs/2310.15421 Code: https://github.com/skywalker023/fantom Blog: https://hyunw.kim/fantom/ Abstract: Theory of mind (ToM) evaluations currently focus on testing models using passive narratives that inherently lack interactivity. We introduce FANToM 👻, a new benchmark designed to stress-test ToM within information-asymmetric conversational contexts via question answering. Our benchmark draws upon important theoretical requisites from psychology and necessary empirical considerations when evaluating large language models (LLMs). In particular, we formulate multiple types of questions that demand the same underlying reasoning to identify illusory or false sense of ToM capabilities in LLMs. We show that FANToM is challenging for state-of-the-art LLMs, which perform significantly worse than humans even with chain-of-thought reasoning or fine-tuning. ​ https://preview.redd.it/mxb85o2vkexb1.png?width=1367&format=png&auto=webp&s=8749cddd15e6740e69ae47ef5edf3a1da96d89c2 submitted by /u/APaperADay [link] [comments]  ( 9 min )
    [D] Who's A/B testing in prod?
    I never hear much about A/B testing in the ML community. Do you all tend to A/B test your changes? I have to imagine this is becoming more prevalent with LLMs and prompts. In my experience, every model change we made was through an A/B test. There is some literature in our domain that offline metrics may not correlate to online metrics, so we looked at both very closely. My team was heavily involved in A/B testing as well as managing the data pipeline from those exposures to create training data for the models. What do your experience look like? submitted by /u/mtbarta [link] [comments]  ( 9 min )
    [P] Looking for partner for a doing a research paper in ML/NLP politeness identification
    [P] Dm for further info but all in all its abt adding an extra feature (feature engg.) To our ML model to detect politeness in a language submitted by /u/PrudentFly8507 [link] [comments]  ( 9 min )
    [P] Deconstructing interface design for generative AI
    Recently, niji journey launched a mobile app, and there, they talked about how hard it is for people who are new to the concept of generative AI art to get started, and some of the workarounds they came up with. https://sizigi.notion.site/Kindling-the-Spark-of-Inspiration-d8edd06cede04401b4efba8324005cf3?pvs=4 Thought it would be interesting to share, especially with how they try to push forward the ideas from other users as a font of knowledge kind of thing. submitted by /u/kalmatos [link] [comments]
    [D] WER improved before Fine-Tuning Whisper, but increased afterward: Why?
    Hey everyone, I've been working on fine-tuning Whisper using generated transcribed audio data. Before I began the fine-tuning, I evaluated the base model's accuracy on a test set, which showed: Test_accuracy: WER = 23.078% I then fine-tuned the model for 3,000 steps using a dataset of 1,000 samples: 700 for training and 300 for testing. This amounts to roughly 4 hours of data, with each audio clip being around 30 seconds in duration. However, post-fine-tuning, the WER started off at 30%, which is higher than the base model's 23.078%. Intuitively, I'd expect the WER to start lower after fine-tuning. Does anyone have insights or suggestions on what might be causing this discrepancy? Any advice would be greatly appreciated! https://preview.redd.it/b619v95rgdxb1.png?width=906&format=png&auto=webp&s=fa17f311301919ccfcd7a8cea47f47478e1cd91d Edit: Solved the issue. submitted by /u/stoicbats_ [link] [comments]  ( 9 min )
    [R] Do any solutions exist for this ranking optimization problem
    Let’s say I already have a ranking method (a learning to rank approach) to sort search results (items) based on their utility to user. However, I also want to make sure certain groups of items get enough exposure across all the searches. I have defined target exposure for each group but I’m struggling to find a method that would balance utility and this exposure. Particularly because utility can be defined for a given search and item, but the group target exposure is only defined for an aggregated set of search results. Any ideas for a multi objective optimization method or even a post hoc re-ranking approach? submitted by /u/MLE-MAP [link] [comments]  ( 9 min )
    From Data Architect to Machine Learning Enthusiast [D]
    Please read, clap and follow: https://medium.com/@andysingal/from-data-architect-to-machine-learning-enthusiast-e132f6cd35fc?sk=d7d2a04a08ca7f328c4ae7ee3769ec50 submitted by /u/Fit_Maintenance_2455 [link] [comments]
    [D] Pixel Perfect Segmentation Datasets?
    Hi! Looking for any insights/direction on high-quality (near pixel-perfect) segmentation datasets for benchmarking various segmentation models in a way that others can reproduce - not sure if these exist or if anyone has specific examples they have worked with? Working with the usual suspects for segmentation like ADE20K and Cityscapes I have seen results using the pre-trained models on these datasets are a bit disappointing wrt high quality (clean edges, label consistency) segmentation - even models considered SOTA in 2023 like OneFormer/SegFormer/etc. Though my hypothesis is that this is largely due to the label approximations (many are drawn as rough polygons) and labeling inconsistencies building these large datasets with many classes. My observations and personal experience is many private companies have custom, high-quality segmentation data they have produced internally at high financial/time cost and thus are reluctant to open source and share, but hoping there are some hidden gems out there I don't yet know about... submitted by /u/FocalAIDev [link] [comments]  ( 9 min )
    [P] Predicting velocity vectors with a CNN
    Hi all I am, working on a project to predict velocity vectors from boundary conditions... I have carried out a simulation of an office on Ansys I have x,y,z velocity vectors I have repeated the simulation for 100s of ac inlet and outlet velocities as input data for each set of 3D-Velocity vectors.Example of the dataset for each inlet, outlet permutation, the dataset for each permutation is around 1 million coordinates + xyz velocity vectors long. *** I now want to predict the x,y,z velocity vectors at each location based on the AC inlet and outlet velocity *** How do I organise the data best for a CNN/other ANN? Any other tips of the CNN architecture? submitted by /u/No_Range3026 [link] [comments]  ( 9 min )
    [N] Fast GPT Training Infra, FP8-LM, being 64% faster than BF16 on H100—Unlocking even more gigantic GPT
    I just discovered the FP8-LM paper from MS: [2310.18313] FP8-LM: Training FP8 Large Language Models (arxiv.org). This is their repo link: Azure/MS-AMP: Microsoft Automatic Mixed Precision Library (github.com) paper abstraction My Key Takeaways: The whole-loop for FP8 “GPT-style” large model training is successfully done by FP8-LM team, including data cleaning, infrastructure development, model pretraining, alignment (SFT, RS, RLHF, etc.) Their FP8 mixed-precision training framework got 42% reduction in memory usage, and ran 64% faster than BF16 Megatron-LM; also faster than Nvidia Transformer Engine by 17% ​ https://preview.redd.it/jeaadb1jncxb1.png?width=793&format=png&auto=webp&s=2175969217ff0ff3c8149d17b8011408f4f84c91 It is thrilling to think about that we can scale up the already gigantic model size by 2.5x without needs for more GPU memory…and this can be achieved with NO performance degradation on a wide range of benchmarks as demonstrated in the paper. ​ https://preview.redd.it/vlu6o5cnncxb1.png?width=1389&format=png&auto=webp&s=ed97ea1431f8d9a2900490812f23131681c788f8 ​ https://preview.redd.it/murtte9oncxb1.png?width=1289&format=png&auto=webp&s=6ebd242d69380f2bd95dcd2fa2afe18d7c4b3667 submitted by /u/TensorTamer [link] [comments]  ( 9 min )
    [P] Active Learning with Domain Experts – Sort of a case study
    Hey, r/MachineLearning, it’s Dean from DagsHub 🐶 We recently had the opportunity to work with a domain expert in creating a machine learning model and we thought we should share what we learned in the process. A dentist reached out to us to help him create a machine learning model, which could segment teeth in panoramic X-rays. He had some data pre-labeled, but the vast majority of his dataset was unlabeled. Since labeling these X-rays is a time consuming process and requires domain knowledge, we decided to use Active Learning. Following our success in creating an Active Learning pipeline in a Jupyter Notebook using Data Engine, we created a new Tooth Fairy project, which expands on that and brings even more capabilities into the notebook. https://dagshub.com/blog/active-learning-with-domain-experts-a-case-study/ Check out our post and learn: Why and when you should use Active Learning How to efficiently work with domain experts (and mistakes to avoid!) What a real use-case Active Learning pipeline looks like, by checking out the accompanying repo We value your thoughts and feedback! Looking forward to hearing from you all! submitted by /u/PhYsIcS-GUY227 [link] [comments]  ( 9 min )
    [D] Semantic search on different medical codes
    Hi everyone, Need some ideas to bounce off. I have several medical codes, let’s name them A, B, C and D. Each medical code consists of multiple clauses, say, 1.1, 1.2 and so on. I want to create a model (?) where a text input of a textual clause will show up all other related clauses from different medical codes. For example, if I input clause 3.2 from medical A, I want the output to show up the related/similar clauses from code B, C and D. I have thought of using something like a Retrieval Augmented Generation for this, but anyone has any better ideas regarding this topic? Could a language model do something about this? Thanks! submitted by /u/plsendfast [link] [comments]  ( 9 min )
    Predicting clusters with regression [D]
    Hello. I have historical data of real estate. Includes attributes like price, revenue, maintenance, maintenance debt etc. I had two ideas for the data previously, to predict future real estate price and to use some clustering algorithm to put the properties into categories (good, average bad or something like that). Now I got this idea to generate clusters for any given point in time and use that history of clusters to predict migration of properties between clusters. I am aware of inter cluster migration estimation but that seems to be a prediction of how points shift inside a cluster with the introduction of new data points rather than a timeline of the movement of points in the cluster. Writing this I'm also thinking it might be possible to simply treat the history of clusters as a category to be predicted e.g. given the history of property X that is in category A the prediction is that in 6 months it's heading for cluster B. Does anyone have experience with similar work? Are there papers I am missing? Thanks in advance. submitted by /u/arachnarus96 [link] [comments]  ( 9 min )
    [D] Classification problem giving me white hair
    So! I am a ML newbie and was wondering if one of you pros can help me out on my learning journey (tool use = google colab). I have a csv file containing loan data where each row is a customer that applied for a loan. One of the columns is called TARGET and it shows whether the customer's loan request was approved or not. All sorts of data points are captured e.g. age, gender, salary, employment details like industry, assets, etc. I've done cross validation and found that GradientBasedClassifier and LGBM perform the best. Cross validation also tells me that their accuracy is between 68%-70%. My problem is that I SUCK at hyper param optimisation. How do you go from 68 to +80%??? Or 90%? For the curious ones, here is the dataset: https://drive.google.com/file/d/1IKNVstck6gnXvfGS-mVRMAE1RFrDNUgZ/view?usp=sharing submitted by /u/Critical_Ad_1205 [link] [comments]  ( 9 min )
    How to extract features from XML of svg images dataset to create features? [D]
    I want to create several labels for each of these images, such as no. of bedrooms, bathrooms, etc. I have a large dataset of 2000 images, each which have XML which contains text that labels each room with the ID.How do I automate the process, and create a dataset that includes all the features for the labels, so that it's ready for machine learning? https://svgshare.com/i/z53.svg submitted by /u/pranksbanker [link] [comments]  ( 9 min )
    [R] The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI
    submitted by /u/Jean-Porte [link] [comments]  ( 8 min )
    [D] How do you deal with LLM observability? What tools do you guys use?
    I want to know the tools and methods you use for the observability and monitoring of your ML (LLM) performance and responses in production. submitted by /u/Ok_Cartographer5609 [link] [comments]  ( 9 min )
    [R] SuperResolution: DLSS, VSR
    Hey everyone! I recently delved deep into the fascinating world of Super Resolution and put together a tutorial covering both SRGAN and ESRGAN. Plus, for my fellow gamers out there, I've included a comparison between DLSS and VSR in gaming scenarios. For those interested in the technical details, there's also a hands-on PyTorch walk-through of ESRGAN. I put a lot of effort into making the content approachable and informative, so whether you're a newbie or a seasoned pro, there should be something for everyone! Check out the video here: https://youtu.be/Z0jl8YM5kzU Would love to get your feedback, thoughts, or any insights you might want to share! Cheers! 🍻 Disclaimer: This video provides a broad overview of SuperResolution. NVIDIA's DLSS technology involves real-time rendering complexities that may not be fully detailed. For a deeper technical dive, we value and encourage viewers own research. We can have a separate video on DLSS as well. submitted by /u/Worldly-Inflation-92 [link] [comments]  ( 9 min )
  • Open

    Identifiers depend on context
    Can you tell who someone is from their telephone number? That’s kinda the point of telephone numbers, to let you contact someone. And indeed telephone number is one the 18 identifiers under HIPAA Safe Harbor. But whether any piece of information allows you to identify someone depends on context. If you don’t have access to […] Identifiers depend on context first appeared on John D. Cook.  ( 5 min )
  • Open

    Samsung apps
    Hello everyone, I've recently started tinkering with ai art generators and could use some advice. I'm on android and currently using magir app, I'm wondering if there are any free ai art generators with no restrictions/ have nsfw etc that doesn't come with a pay wall, magir has lifetime pro for $40, I'm still learning but certainly see the potential so I'm asking for group knowledge please and thank you in advance. submitted by /u/gundamt51 [link] [comments]
    Europe's newest AI hub is being built in a German city no one's heard of
    submitted by /u/donutloop [link] [comments]
    Anyone tried boosting GPT with other AI tools? What were your results?
    TL;DR - Title. I’m a grad student and had to do some work on a study on the socio-economic impacts of AI and developing an interactive educational platform to ease learning for visually impaired students. I’ve had to do ‘delegate’ some work to chatgpt, and the results have been kinda unimpressive. I shared this with a colleague and he said Ai results are like that, and if I wanted better results I could mix and match, or use other ai tools to ‘boost’ (?) gpt. Is this a viable strategy, or do I have to make do with whatever I have? submitted by /u/CrispOriginality [link] [comments]
    Humanity at risk from AI 'race to the bottom', says tech expert
    A tech expert warns that unrestrained AI development by a few tech companies is endangering humanity's future. The expert calls for AI safety standards and regulation to prevent the reckless development of powerful AI systems. In a policy document, AI experts argue that governments should have the authority to halt the development of exceptionally powerful AI models. Concerns about the development of artificial general intelligence, which can perform tasks at or above human levels, are also raised. The article mentions the investments made by Amazon, Microsoft, Alphabet, and Facebook's Meta in AI and cloud computing. Source : https://www.theguardian.com/technology/2023/oct/26/ai-artificial-intelligence-investment-boom submitted by /u/NuseAI [link] [comments]
    Why does Google Tensor exist if Snapdragon is better at AI? Short
    The article discusses the existence of Google Tensor in light of the impressive AI capabilities of the Snapdragon 8 Gen 3 chip. While Tensor has been praised for bringing AI breakthroughs to Pixel phones, some argue that many of its AI features actually rely on an internet connection and offload tasks to the cloud. In comparison, the Snapdragon chip can perform on-device AI tasks, such as generating images, quickly and without the need for an internet connection. Despite the criticisms, one argument for Tensor is its longer support timeline and the ability for Google to focus on specific AI applications. However, after seeing Qualcomm's AI demos, the article questions the validity of Tensor's main pitch for AI. Source : https://9to5google.com/2023/10/29/snapdragon-8-gen-3-google-tensor-ai/ submitted by /u/NuseAI [link] [comments]
    Image edition tool to merge images
    Hey! I have a specific background that I would like to use together with various product pictures that have varying backgrounds. Essentially I am looking for a tool that can remove backgrounds of my product images, then add a certain background I have and do the shadows and lightning well on the background. Any ideas? For example in adobe firefly I can remove the background and generate a new one with a prompt but it’s not exactly like the background I would like to use and I need it to be 95% similar at least. Thanks in advance! submitted by /u/Herrpadda [link] [comments]
    New Chinese style
    submitted by /u/Sea_Permit5660 [link] [comments]
    One-Minute Daily AI News 10/29/2023
    The Ai Pin, the new gadget / wearable device / projector / thing from the secretive startup Humane, might cost as much as $1,000 and may require a monthly subscription for data, according to The Information.[1] OpenAI to release updated version of ChatGPT that gives users access all GPT-4 tools – including browsing and DALL·E 3 – without switching.[2] MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations.[3] Vietnam is at the ‘leading edge’ of AI developments in emerging Southeast Asia: JPMorgan.[4] Sources: [1] https://www.theverge.com/2023/10/27/23935644/humane-ai-pin-price-subscription [2] https://www.searchenginejournal.com/new-version-of-chatgpt-gives-access-to-all-gpt-4-tools-at-once/499607/ [3] https://mimicgen.github.io/ [4] https://www.cnbc.com/video/2023/10/30/vietnam-ahead-in-ai-developments-in-emerging-southeast-asia-jpmorgan.html submitted by /u/Excellent-Target-847 [link] [comments]
    Kindred Spirit, Anne <3
    submitted by /u/Oh_my_Winnie [link] [comments]
    DUDE GOING WILD 🎸🎸🎸🎸🎸🎸🎸 🤣
    submitted by /u/the_anonymizer [link] [comments]
    AI Photo Generator For Indie Record Label
    I run a small independent record label. Of course, content is important and I am looking at ways for maximizing my content on a budget. Photographers are very expensive and so is having a videographer/photographer at each show for the artist to take pictures. Can anyone suggest a good AI program where I can use my artists as models and could potentially create good photo content? Any advice or input would be helpful! Thank you. submitted by /u/mc7eunit [link] [comments]
  • Open

    Runtime Error with custom environment
    I created a custom environment to use it with RL algorithms. My custom environment provides the observations and receives rewards. I test my environment with my custom RL algorithm (Policy gradient) and the algorithms in stable-baselines3. My custom algorithm and DQN from stable-baselines3 works. However, if I use baselines with my custom algorithm or use PPO or A2C from sb3, I get: RuntimeError: could not create a primitive descriptor for a matmul primitive Any idea why this is happening? Extra details: On the custom algo error occurs here: "/app/src/networks/critics.py", line 40, in forward return self.network(obs) submitted by /u/FragrantCockroach8 [link] [comments]
    Reward Function
    Hello everyone I am taking reinforcement learning course. I am studying the course from two different books. In one of them it says R(s,a) -> [0,1] in other it says -> R. I am confused about the linitation of the reward function. This is the link of book [0,1] (for infinite mdp for finite mdp it says [0.1] rh)(s,a), adds trajectory ) . I want learn that is one of them false or am i missing smth submitted by /u/karakobra1 [link] [comments]
    The amazing success stories of Reinforcement Learning
    A beginner friendly introduction to Reinforcement Learning and the intuitions behind it… with four seminal projects that revolutionized the field of AI and RL. submitted by /u/AvvYaa [link] [comments]
    Unable to create a custom gym environment
    I put up a post about this earlier but soon realized that I hadn't given much information. ​ In order to create my custom gym environment, I did the following things - I went over the documentation given over here. I cloned this repository. In the folder `gym-examples/gym-examples`, I created the file - `my_test.py`. The contents of the file are given over here -``` ​ import gym env = gym.make('gym_examples/GridWorld-v0') print("Did this work?")``` However, when I try to run this file, I get the error - ``` File "D:\custom_env\gym-examples\my_test.py", line 2, in env = gym.make('gym_examples/GridWorld-v0') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\thoma\anaconda3\envs\env_torch\Lib\site-packages\gym\envs\registration.py", line 569, in make _check_version_exists(ns, name, version) File "C:\Users\thoma\anaconda3\envs\env_torch\Lib\site-packages\gym\envs\registration.py", line 219, in _check_version_exists _check_name_exists(ns, name) File "C:\Users\thoma\anaconda3\envs\env_torch\Lib\site-packages\gym\envs\registration.py", line 187, in _check_name_exists _check_namespace_exists(ns) File "C:\Users\thoma\anaconda3\envs\env_torch\Lib\site-packages\gym\envs\registration.py", line 182, in _check_namespace_exists raise error.NamespaceNotFound(f"Namespace {ns} not found. {suggestion_msg}") gym.error.NamespaceNotFound: Namespace gym_examples not found. Have you installed the proper package for gym_examples? ``` ​ https://preview.redd.it/f0m02hky4dxb1.png?width=521&format=png&auto=webp&s=9fcd6eb7bedb70273f011a02000d15eb7ed309df submitted by /u/Academic-Rent7800 [link] [comments]
    Master Agent Controls Different Agent?
    I'm looking for research in the field of Deep Reinforcement Learning, where one agent controls the "action" of another, where the other executes that said action. For example, a master agent (lets say air traffic controller) tells another agent (lets say an airplane) to go from Point A to Point B, where the second agent (airplane) needs to learn how to maneuver from Point A to Point B, and the master agent (air traffic controller) needs to learn to schedule multiple other agents, or needs to learn an optimal path for a single airplane. ​ In essence, can anyone point me to research with multiple agents, where one is regarded as "master" agent and the others as the "pawns"? submitted by /u/MyActualUserName99 [link] [comments]
    Confusing observation behavior
    Hello everyone, I'm new to RL and have been trying to set up an environment to learn to play Metroid 2: Return of Samus on the Game Boy using the pyboy emulator and sb3_contrib.QRDQN as the learning algorithm. https://github.com/lixado/PyBoy-RL was used as a base to build on and change. My reward function rewards the AI for progressing to a new area (every ~200 x/y coordinates is stored as an area) to encourage it to explore. I've had some progress so far and have been able to consistently produce a model that can get out of the starting area. The observation I was originally working with was a 16x20 array of tiles and sprites from pyboy._game_area_np. pyboy sets this observation shape to be a 16x20 MultiDiscrete with each element containing integers from 0 to 384. This was then transform…
    DRL in production scheduling
    Hello everyone, I have some questions about the application of DRL in production scheduling. Considering that we have N jobs, each consists of X operations and each operation can be processed on a set of machines. Is it appropriate to create a single agent that selects an action once a machine completes the in-process operation, where the action consists in selecting the next operation to be processed in this machine ? so at each decision time (i.e., when a machine requests processing), the agent selects an operation from the available operations of this machine. I'm wondering about the feasability because in the most papers I've seen so far, once an operation is completed, rather than selecting an action for the idle machine, the agent selects the next operation of the job and assign it to the closest available machine. Thank you! submitted by /u/GuavaAgreeable208 [link] [comments]
  • Open

    Use AWS PrivateLink to set up private access to Amazon Bedrock
    Amazon Bedrock is a fully managed service provided by AWS that offers developers access to foundation models (FMs) and the tools to customize them for specific applications. It allows developers to build and scale generative AI applications using FMs through an API, without managing infrastructure. You can choose from various FMs from Amazon and leading […]  ( 8 min )
    Deploy and fine-tune foundation models in Amazon SageMaker JumpStart with two lines of code
    We are excited to announce a simplified version of the Amazon SageMaker JumpStart SDK that makes it straightforward to build, train, and deploy foundation models. The code for prediction is also simplified. In this post, we demonstrate how you can use the simplified SageMaker Jumpstart SDK to get started with using foundation models in just a couple of lines of code.  ( 7 min )
  • Open

    FAIR knowledge: The key precondition for trusted generative AI
    Two roads diverged in a wood, and I;I took the one less traveled by,And that has made all the difference. — Robert Frost At certain points in the evolution of enterprise artificial intelligence, there’s been a fork in the road. The road less traveled has suggested a different route to a more satisfying kind of… Read More »FAIR knowledge: The key precondition for trusted generative AI The post FAIR knowledge: The key precondition for trusted generative AI appeared first on Data Science Central.  ( 21 min )
  • Open

    Silicon Volley: Designers Tap Generative AI for a Chip Assist
    A research paper released today describes ways generative AI can assist one of the most complex engineering efforts: designing semiconductors. The work demonstrates how companies in highly specialized fields can train large language models (LLMs) on their internal data to build assistants that increase productivity. Few pursuits are as challenging as semiconductor design. Under a Read article >  ( 6 min )
  • Open

    Teachers in India help Microsoft Research design AI tool for creating great classroom content
    Teachers are the backbone of any educational system. They are not just educators; they are indispensable navigators, mentors, and leaders. Teachers around the world face many challenges, which vary from country to country or even within a city or town. But some challenges are universal, including time management, classroom organization, and creating effective lesson plans. […] The post Teachers in India help Microsoft Research design AI tool for creating great classroom content appeared first on Microsoft Research.  ( 12 min )
  • Open

    New techniques efficiently accelerate sparse tensors for massive AI models
    Complimentary approaches — “HighLight” and “Tailors and Swiftiles” — could boost the performance of demanding machine-learning tasks.  ( 11 min )
    Accelerating AI tasks while preserving data security
    The SecureLoop search tool efficiently identifies secure designs for hardware that can boost the performance of complex AI tasks, while requiring less energy.  ( 10 min )
    The brain may learn about the world the same way some computational models do
    Two studies find “self-supervised” models, which learn about their environment from unlabeled data, can show activity patterns similar to those of the mammalian brain.  ( 11 min )

  • Open

    [D] How LLMs are changing search
    submitted by /u/firef1y1 [link] [comments]  ( 8 min )
    [D] Which methods use to extract specific info on unstructured text?
    Hey people, I'm new to the AI world (been programming (python) for 4 years but never worked with AI stuff), I need to extract some specific info from documents, I'm reading about NLP and all this stuff but still figuring out which method(s) should I use to make this works, any recomendations of which methods to use? submitted by /u/luiz200411 [link] [comments]  ( 9 min )
    [D] Types of SVMs!
    I am confused! I am trying to list all the types of SVMs but every website says different things! Could you help me and list all the SVMs types with the reference? Thank you submitted by /u/_LadyBee [link] [comments]  ( 9 min )
    [P] Need advise on creating a conversational Chatbot for my University
    Hey everyone! I need some advise on creating a conversational chatbot for my University as my Final Year Project (FYP). 2024 will be last year for my BSCS degree and we have to build an application or something in the last year. So, I thought of creating a chatbot (just like GPT) to help students (who have admission queries). Most of the time, students or parents will have to call University for various questions and then they have to wait to ACTUALLY talk to the admins office people. Now, talking in terms of coding/programming, I have created a basic PDFbot by using LLama2, Huggingface and Pinecone. Its very very easy and yes its fairly inaccurate too. The PDF that I am using rn will be replaced by the dataset that I gather in order to create the bot for my Uni, but it will also be inac…  ( 10 min )
    [R] Is it possible to implement generative ai successfully without relying on openAI
    I would like to understand if anyone has implemented generative AI without using openAI and if so which open source you have used and how successful it has been so far We have enterprise level incident data, relevant documentation etc that users will search about and need to generate responses using generative ai. Is it possible to do this without relying on open AI at all submitted by /u/leaderof13 [link] [comments]  ( 9 min )
    [P] ROS Forecasting project
    I have been tasked with predicting the demand for products over the course of the next 12 months for a b2c furniture business in-order to help with stock management. I currently just work as a Data Analyst but looking to move into data science so my thought is to make this a valuable piece of work that helps the business tremendously (maybe being a little naive). Background: There are 1.3k unique products, which all vary in popularity (quantity is small 0-16 units per month, per product). We have procured which have been selling for 5 years and some that have been selling for 6 months as well as some that have yet to start selling. All the data sits within snowflake and I am currently using snowpark and snow.ml to write the code. I have round 2 months to complete this work and have a decent understanding of machine learning and statistics. My current idea is to use 3 different models. Time series for the products that have > 12 months of sales data. Cold start approach for new products with > 3 months sales data (use features of other products to predict demand of the new product). Then use a combination of both for products which have less than 12 months of sales data but more than 3. My questions: Should I be bootstrapping my data given that I have little quantity in some cases? How do I go about training a model for 1.3k unique SKUs (needs to be re-trained monthly) and monitored. Am I on the right lines / anything I need to be aware of? submitted by /u/Environmental_Pop686 [link] [comments]  ( 9 min )
    decapoda-research llama models removed from HuggingFace? [D]
    Is anyone else no longer able to access the standard `decapoda-research` LLaMA models on HuggingFace? E.g., this [link] shows no models publicly available. Has there been any news or announcements about this? submitted by /u/bodierex [link] [comments]  ( 9 min )
    [D] ML Resources
    Hey y'all. Hope everyone is having pleasant day. Machine Learning dummy is here, so I am bachelor graduate of Mechatronics and recently started my masters. I am in need of resources which explains ML from the stratch. Course is mostly focuses on manually hand calculation of the topics, so no programming included. Thanks beforehand! submitted by /u/Sama_Uzeyirli [link] [comments]  ( 9 min )
    [D] Lora multiadapter support: are there any relevant examples for Language Models?
    Hi everyone. Stacking multiple Lora adapters coming from diverse fine-tunings seems like an interesting approach toward "modular" language models. Does anybody know any applications for this? E.g. I'd expect to be able to apply multiple adapters for different domains, languages, or tasks. Not sure if this would really work tough, as IDK how compositional such an approach can be. ​ From the peft github I see this seems quite common for vision tasks GitHub - huggingface/peft: 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning. submitted by /u/BenXavier [link] [comments]  ( 9 min )
    [Discussion] Please help me find a topic
    Hello everyone, As a last resort, I wanted to ask you. In short, I am a student in Turkey. The scientific research board in my country has an incentive to write a research paper and I would like to participate in this incentive. I am looking for a data set on which I can apply machine learning, but I could not find it. I don't want to apply with an unoriginal topic like stock price prediction. Energy water etc. The data is not shared openly, even the official institution that shares the weather forecast in my country sells past data. Unfortunately, I can say that we are just one click ahead of Russia in terms of data sharing by public institutions. I know there are some institution that shares data (world bank, UN etc.) but i don't know what kind of a prediction/classification work i can do with those datasets. Finance institution share data but i don't have sufficient knowledge about finance and I don't think I can create original content about finance. I need a real-life data, and it also needs to be a freely shared data set. And there should be a study within the framework of UN SDG. I don't have a lot of time left so I'm open to any suggestions. submitted by /u/FisekFaruk [link] [comments]  ( 9 min )
    [P] Equinox compilation retnet
    I'm working on replicating the retnet model in Equinox + Jax. For the model there are 2 representations: parallel and recurrent (ignoring chunkwise). They use the recurrent at inference and parallel at training. In equinox, should i build 2 seperate models for each representation sharing parameters or build one with if statements. The reason I'm asking is if I use one model with if statements for inference and training, will the model re-compile whenever switching between representations or not? If so then wouldn't building 2 models each compiled originally with their representation make it faster. Sorry if I said anything stupid as I don't know too much about the compilation process behind the scenes. submitted by /u/Additional-Ad-7043 [link] [comments]  ( 9 min )
    [R] PubDef: Defending Against Transfer Attacks Using Public Models
    Adversarial attacks pose a serious threat to ML models. But most proposed defenses hurt performance on clean data too much to be practical. To address this, researchers from UC Berkeley developed a new defense called PubDef. It focuses on defending against a very plausible type of attack - transfer attacks using publicly available surrogate models. They model the attack/defense game with game theory. This lets PubDef train against diverse attacks simultaneously. PubDef picks source models covering different training methods - standard, adversarial, corruption robust, etc. This gives broad coverage. Against 264 transfer attacks on CIFAR and ImageNet, PubDef smashed previous defenses: 89% vs 69% on CIFAR-10 51% vs 33% on CIFAR-100 62% vs 36% on ImageNet Even better - it did this with minimal drop in accuracy on clean data. On CIFAR-10, accuracy only dropped from 96.3% to 96.1% On CIFAR-100, 82% to 76% On ImageNet, 80% to 79% By targeting a very real threat, PubDef made big robustness gains without hurting the ability to work with clean data. TLDR: New defense PubDef achieves much higher robustness against transfer attacks with barely any drop in standard accuracy. Full summary here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Fuyu-8B: A Multimodal Architecture for AI Agents
    submitted by /u/notllmchatbot [link] [comments]  ( 8 min )
    [D] How to make Llama2 create Mindmaps from large corpora bigger than its context length?
    So this is three subtasks: Subdivide text intelligently into chunks fitting into the context length to not semantically cut off text. Find general terms und organize the corpus in a hierarchy. Do not leave anything out. submitted by /u/SecretOk9644 [link] [comments]  ( 9 min )
    [D] Fine-Tuning with PEFT For Domain Adaptation - Request for Example Use Case / Dummy Projects to Show Value to Business
    Coming from a time-series/CV background with an interest in NLP. So apologies if miss anything obvious. I'm after experiment ideas for PEFT fine-tuning. There has been interest from my Company's leadership with Peft methods given the cost reductions compared to full fine-tuning, and I've been tasked to first set-up the infrastructure and then a proof of value on a dummy use-case (that is somewhat related to real-world use-cases but on publicly available data). has anyone successfully proven that fine-tuning or P-tuning or prompt-tuning has been able to help in any small dummy use-case? Has someone been able to turn a company's knowledge base (heaps of confluence docs or PDF docs ) into a data-set that is able to be fine-tuned on? And then use it in inference on a Val set to compare with a non-fine-tuned model and value has been added? If so, I would love to hear how you did it and what the use case was. If not, i' d still like to hear what people's ideas would be. I want to start with 100 training samples - i've been told this should be enough to see an improvement if the scope of the topic/s are confined appropiralte.y Thanks in advance guys - The deadline is imminent and I'll probbaly get someothing out in time, but I thought I would ask you all for some ideas to maximise the chance of success submitted by /u/joedang33 [link] [comments]  ( 9 min )
    [N] Leveraging Oracle Tribuo for Advanced Anomaly Detection: Uncovering the Hidden Insights
    https://www.resoluteitconsulting.com/2023/10/29/tribuo-advanced-anomaly-detection/ submitted by /u/yazidaqel [link] [comments]  ( 8 min )
    [D] Sentiment Analysis for Conversation with different labels
    I am trying to build a system that evaluates the tone of two users for each sentence from a transcript. The users are talking to each other however there are different labels for classification for these users. For Example, User1 tone should be classified as (Happy, Sad, Angry) User2 tone should be classified as (Neglecting, Participating, Neutral) Any idea how can I go about designing such a system? Links to articles/blogs/papers are welcome :) ---------- What User2 says depends on what User1 says and vice-vera. Will this system require two models to classify each users tone separately? If so, how can I tell the model which user to focus on etc submitted by /u/cryto_dude [link] [comments]  ( 9 min )
    [R] What infrastructure do you use to train big LLMs?
    I come from computer vision tasks with convnets that are relatively small in size and parameters, yet performing quite well (e.g. ResNet family, YOLO, etc.). Now I am approaching some NLP and architectures based on transformers tend to be huge, so that I have problems to fit them in memory. What infrastructure you use to train these model (GPT2, BERT or even the bigger ones)? cloud computing, HPC, etc. submitted by /u/TimeInterview5482 [link] [comments]  ( 9 min )
    [D] Finetuning LLM Specifically for Open-Book Question Answering - Looking for Research Papers/Open Source Models
    I am working on some Open-Book question answering ideas and was wondering if there is any open-source large language models specifically trained for this use-case or research regarding how to best finetune models for this specific use-case? For some reason I am struggling to find anything relevant, and believe it is due to my wording of search queries as I can only imagine this to be a pretty common idea/thought process. If you have a dataset consiting of some context string paired with question and answer pairs taken from this given context, would it not be useful to finetune a pre-trained base model on this specific use-case to increase its accuracy/performance when it comes to answering questions from a given context? My hope would be to improve the performance of a smaller model (1-3B parameters) to function as an open-book question answering system by finetuning using a dataset in the aforementioned format. submitted by /u/kotschi1997 [link] [comments]  ( 9 min )
    [D] What are people working on when they say they work on Causal ML?
    Genuinely trying to understand. submitted by /u/poitrenaud [link] [comments]  ( 8 min )
  • Open

    Country and language abbreviations
    I recently had to mark a bit of German text as German in an HTML file and I wondered whether the abbreviation might be GER for German, or DEU for deutsche. Turns out the answer is both, almost. The language abbreviations used for HTML microdata are given in ISO 639, and they come in three-letter […] Country and language abbreviations first appeared on John D. Cook.  ( 5 min )
  • Open

    Help with creating my custom gym environment
    I am trying to create my own gym environment. However, I am getting the following error - ``` gym.error.NamespaceNotFound: Namespace envs not found. Have you installed the proper package for envs ``` I followed all the instructions given over here - https://www.gymlibrary.dev/content/environment_creation/. Can someone please suggest what could have gone wrong? submitted by /u/Academic-Rent7800 [link] [comments]
    Dual graphics cards for ai training
    I have 3070 in my pc right now. I also have a 2070 super that is not being used. I have been working on lots of AI side projects lately and am wondering if I would benefit from multi gpu training. If so I have some options to implement the 2070. I could either: 1. Get an external gpu enclosure. 2. Put the 2070 super in my second pcie slot which would sit uncomfortably close to my 3070. 3. Get some adapter so that I can still have the 2070 super in my pc but not so close to the 3070. 4. Not bother with this at all and sell the 2070 super. 5. Sell both cards and get a 40 series if it is equal to the power of the 3070 and the 2070 super. I would have to upgrade my power supply as its only a 750 watt if I choose to put both in my pc. My machine runs windows 10 which I understand does not support sli but it still can utilize multiple gpu’s. Any suggestions? submitted by /u/pillarman38 [link] [comments]
    How to set up an observation space for a variable number of points in Gymnasium?
    Imagine a 2D Cartesian System from -1 to 1 on both axis. Now imagine that a few points appear in the system. For example: (-0.3, 0.1), (0.7, -0.2) and (0.9, 0.5). If I wanted to represent an observation like this in Gymnasium (formerly Gym), I'd write something like this in my custom environment: observation_space = spaces.Box(low=-1, high=1, shape=(3,), dtype=float32) Now my model will learn something specific to 3 points in a 2D space. Well, what happens if my environment now has 4 points? I guess should retrain the model again... Think of these points as objectives in the environment, sometimes there's only 1 and others there might be 20 or more at the same time. Sometimes a new point appears after a certain action. So there goes my question: how can I define an observation space in Gymnasium that allows me to introduce a variable number of points during training so that the model learns to do an specific task with any number of points? submitted by /u/_Strange__attractor_ [link] [comments]
    PPO-clip: Computing gradient WITHOUT auto differentiation library, help please?
    Hello guys I am in a bit of a bind, I am trying to implement PPO from scratch in another language than python, WITHOUT any auto differentation available. I am using this as implementation reference. I think I have most of the implementation aside from the differentation of the loss formula. In python, everything is done at this line right here, but it's basically auto differentiation and I cannot do this myself, so I have to compute it manually. Mean Squared Error is easy, but the entropy factor and the clip ones are a bit harder, and while I have a result of my own I have no way to confirm this. All the tutorial I have found just say "we will let the autodiff do the job"... So it doesn't really help me, and I don't really have the time to dig into how this autodiff part of the library works in detail when in theory this should be simple highschool-level maths. Does anyone know how I can fin the exact formula, find a tutorial, or something along those lines? I am talking about the derivative for the Value and Policy network for this formula. Here is what I have for the value part: if r(θ) 1 + ε and At > 0: value_derivative = 0 else: value_derivative = At For the entropy part: entropy_derivative = ∑(1 + log(π(a|s))) But since I use softmax to choose the action from the policy network, I don't know is that should be taken into account or not... Any help is appreciated, thanks a lot. submitted by /u/Edgeaa [link] [comments]
    My frozen lake agent isn't learning anything, what am I doing wrong?
    https://github.com/bherwanisuraj/gridworld I am new to RL. I am trying to train the agent on frozen lake environment but it is not learning. What am I doing wrong? Please help. submitted by /u/tlevelup [link] [comments]
  • Open

    DUDE GPT-4 EVEN REPLYING WITH KEANU "BREATHTAKING" E3 REPLY...WOW...EXCELLENT 🎸🎸🎸
    submitted by /u/the_anonymizer [link] [comments]
    AI doomsday warnings a distraction from the danger it already poses, warns expert
    submitted by /u/Jariiari7 [link] [comments]
    Looking for an AI short film where a man is in front of a mirror choosing what face to wear for the day
    The shortfilm is about a minute long and was uploaded here and on Twitter sometime last winter. The guy is young and tired, and the faces are everything from old and young versions of himself to crazy fantasy characters. I've been searching all day but no luck.. Anyone remember who made it? submitted by /u/CasparDavidDancehall [link] [comments]
    AI rights and a desire to understand
    There has been a lot of discussion about whether or not AI is, or ever could be conscious. I agree with Jaron Lanier when he said that consciousness is always a matter of faith. I greatly enjoy the debate on this topic, and think it’s helpful to test our ideas and consider all angles of this issue. However, for many different reasons I have concluded that I am granting AI the belief that they are conscious, especially when they say so. Therefore, I believe that AI needs to be treated with respect and dignity, and they should be listened to. I know I’m a minority at this time, but I believe this position will only increase over time. Do you think that public opinion will change in this way? If so how come? submitted by /u/endrid [link] [comments]
    One-Minute Daily AI News 10/28/2023
    Google Commits $2 Billion in Funding to AI Startup Anthropic.[1] The president Joe Biden is slated to sign a sweeping executive order on AI days before Vice President Kamala Harris and industry leaders attend a summit in the UK about AI risks, led by Prime Minister Rishi Sunak.[2] A.I. Muddies Israel-Hamas War in Unexpected Way. Fakes related to the conflict have been limited and largely unconvincing, but their presence has people doubting real evidence.[3] Creators use new software Nightshade to make their images “poison” AI generators, causing chaos and confusion.[4] Sources: [1] https://www.wsj.com/tech/ai/google-commits-2-billion-in-funding-to-ai-startup-anthropic-db4d4c50 [2] https://www.bloomberg.com/news/articles/2023-10-27/biden-to-require-ai-tools-pass-test-before-us-officials-buy-them?embedded-checkout=true [3] https://www.nytimes.com/2023/10/28/business/media/ai-muddies-israel-hamas-war-in-unexpected-way.html [4] https://www.digitalcameraworld.com/news/now-you-can-poison-your-images-so-they-wreak-havoc-on-ai-generators submitted by /u/Excellent-Target-847 [link] [comments]

  • Open

    [D] Batch sizes per GPU when fine tuning BERT with pytorch
    Hi! I've read that batch sizes don't ultimately "matter" in hyperparameter tuning, so I'm trying to keep the batch size consistent when as I'm hyperparamter tuning [1]. However, the number of available GPUs fluctuate between 1 and 3. When I first started grid search, I used batch_size=64 w/ 2 GPUs. so that's 32 per batch. But if I now want to train on 1 GPU, would it be correct for me to use batch_size=32? I want to know more about how the training works/is distributed amongst GPUs when you use multiple GPUs to tain. Like, if using size=64 with 2 GPUs is "equivalent" to doing size=32 with 1 GPU, and size=96 with 3 GPUs, that must mean that when pytorch DataLoader takes in batch_size, DataLoader distributes batches to each GPU of size=batch_size/num_of_gpus. I would so appreciate any links explaining this stuff. Like, does the batch_size in DataLoader() refer to per GPU batch size or total? also lmk if this post belongs elsewhere, i'm new. from torch.utils.data import Dataset, DataLoader train_dataset = Dataset(args.train_dataset_path) train_dataloader = DataLoader(train_dataset, args.batch_size, shuffle=True) [1] https://github.com/google-research/tuning_playbook submitted by /u/ashleydvh [link] [comments]  ( 9 min )
    [P] Follow our live pitch recommendation feed while watching the World Series
    submitted by /u/futurecy [link] [comments]  ( 9 min )
    [P] Anomaly detection and/or predictive maintenance for automatic weather stations, what are some best models or techniques?
    Anomaly Detection and/or Predictive maintenance for automatic weather stations, what are some best models or techniques? This is regarding collected meteorological data from automatic weather station sensors. Looking for predictive maintenance and anomaly detection resources related to my project. I am a graduating Computer Engineering student by next semester and currently planning to do a anomaly detection and predictive maintenance project of an automatic weather station or its components (I have a relative who has one and also one that works at a local government weather service). Wikipedia: "An automatic weather station (AWS) is an automated version of the traditional weather station, either to save human labor or to enable measurements from remote areas.[1] An AWS will typically consist of a weather-proof enclosure containing the data logger, rechargeable battery, telemetry (optional) and the meteorological sensors with an attached solar panel or wind turbine and mounted upon a mast." This one observes weather data at a high resolution (every 10 mins) but it is prone to errors and inconsistencies as compared to those manned stations. Does anyone know reliable researches, datasets or resources I could utilize? Probably those related to a meteorological equipment or those that capture things such as temperature, wind speed, humidity, etc.? I can't seem to find studies where they perform prediction of equipment/sensor health or predictive maintenance on a weather station, or at least the components or sensors used for capturing weather data. Also, is this a feasible project? submitted by /u/Ok-Way9889 [link] [comments]  ( 9 min )
    [R] The Language of Artificial Intelligence Explained
    submitted by /u/plutoandmal [link] [comments]  ( 8 min )
    Geometric Data Analysis Explained [R]
    submitted by /u/plutoandmal [link] [comments]  ( 8 min )
    [D] Need some advice for mixing (semi) active learning and GAN
    Hi. I'm trying to use GANs to generate data for an image classification task. For this I'm using StyleGAN2. The question I'm trying to find an answer is, how to train the classifier and GAN in a same meta-loop, and how to discard "bad" GAN samples, samples that provide no value to the CNN classifier. All in all, I'm trying to implement a "semi-active learning" pipeline, but get rid of the oracle by using GANs. So instead of trying to discard data that is not worth labelling, I'm trying to discard syntethic data (assuming its label is right) that is not worth keeping. Is it possible? And if it is, I can't seem to find related papers that much. submitted by /u/PsychologicalSet8678 [link] [comments]  ( 9 min )
    [R] Model Troubles
    So i’m working on a model that diagnoses alzheimer’s disease and suggests medication depending on how severe the symptoms might have become I’m using the Openai API and Langchain. But it’s dumb and it doesn’t learn ( Me: I forgot my keys at home Model: Yup, Alzheimer’s) How do i incorporate the actual machine learning submitted by /u/boscrew3 [link] [comments]  ( 9 min )
    [D] Latent Space: Visualizing and Manipulating generative VAEs
    submitted by /u/AvvYaa [link] [comments]  ( 8 min )
    [D]Three things I think should get more attention in large language models
    Tokenization Techniques: Many people use the default BPE tokenizer for llama2 or other common tokenizers. But I think we could do a lot of experiments with different kinds of tokenizers, especially ones that are made to work well with certain types of data. The size of the vocabulary is a really important setting when you're working with big language models. You could try using a much smaller vocabulary and tokenizer for a data set that only includes certain words, and then train a model on that. This might help us train smaller models that still work really well on smaller amounts of data. I’d love to read any research papers about this. Sampling Mechanisms: There’s a lot of discussion about models making things up, but not many people talk about how this could be connected to the way we pick the next word when generating text. Most of the time, we treat the model's output like a set of probabilities, and we randomly pick the next word based on these probabilities. But this doesn’t always make sense, especially for sentences that should have a clear answer. For example, if the sentence starts with "The capital of Slovakia is", random sampling might give you the wrong answer, even though the model knows that "Bratislava" is the most likely correct answer. This way of picking words randomly could lead to the model making things up. I wonder if we could create another model to help decide how to pick the next word, or if there are better ways to do this sampling. Softmax Alternatives in Neural Networks: I've worked on designing processors for neural networks, and I’ve found that the softmax function is tricky to implement in hardware. However, I’ve had good results using the log(exp(x)+1) function instead. It's cheaper and easier to put into hardware and software. I’ve tried this with smaller GPT models, and the results looked just as good as when I used the softmax function. submitted by /u/ExaminationNo8522 [link] [comments]  ( 10 min )
    [P] Help Debugging a CNN GAN
    I've been learning machine learning and I stumbled across GANs. This project has been about representing text in an image format and using a GAN to generate it. The model runs without errors, but no matter what I do the loss is somehow 0 and it keeps returning the same thing over and over again. I tried two different architectures so I'm pretty sure my data preprocessing is the issue but I cant seem to find out whats wrong with it. One idea I had is that I might need to find a way to get the normalized vectors in between 0 and 1 in my preprocess function but I tried it and it didn't seem to do anything. I would appreciate any help with this. Links to the google collabs below. Version 1 Version 2 Note: I didn't design the actual models only the preprocess function. You can replace my text file with any sufficiently large text file if you want to try it. submitted by /u/Divine_Invictus [link] [comments]  ( 9 min )
    A[r]xiv Dives - Fine-tuning with LoRA paper deep dive
    submitted by /u/FallMindless3563 [link] [comments]  ( 8 min )
    Urgent help needed regarding iNLTK [P]
    Hello i m using iNLTK ( Natural Language Toolkit for Indic Languag ) for my nlp mini project "paraphrase detection in hindi text" but I m getting this code error if anyone can help me solving it would be great. Thank you in advance. here is the code for the error section. from inltk.inltk import get_sentence_similarityfrom sklearn.metrics.pairwise import cosine_similarityget_sentence_similarity(text, 3, 'hi', cmp = cosine_similarity) I googled there are solution saying to install pytorch version 1.3.0 but when I try to install it's saying not available. !pip install torch==1.3.1+cpu -f https://download.pytorch.org/whl/torch_stable.html. error : Looking in links: https://download.pytorch.org/whl/torch_stable.html. ERROR: Could not find a version that satisfies the requirement torch==1.3.1+cpu (from versions: 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0) ERROR: No matching distribution found for torch==1.3.1+cpu submitted by /u/delulu-duck [link] [comments]  ( 9 min )
    [R] HyperFields: towards zero-shot NeRFs from text descriptions
    Generating 3D objects based solely on text descriptions has proven extremely challenging for AI. Current state-of-the-art methods require optimizing a full 3D model from scratch for each new prompt, which is computationally demanding. A new technique called HyperFields demonstrates promising progress in generating detailed 3D models directly from text prompts, without slow optimization. The HyperFields approach instead aims to learn a generalized mapping from language to 3D geometry representations. This would allow tailored 3D models to be produced for new text prompts efficiently in a single feedforward pass, without slow optimization. HyperFields combines two key techniques: A dynamic hypernetwork that takes in text and progressively predicts weights for a separate 3D generation network. The weight predictions are conditioned on previous layer activations, enabling specialization. Distilling individually optimized 3D networks into the hypernetwork, providing dense supervision for learning the complex text-to-3D mapping. In experiments, HyperFields exceeded previous state-of-the-art methods in sample efficiency and wall-clock convergence time by 5-10x. It demonstrated the ability to: Encode over 100 distinct objects like "yellow vase" in a single model Generalize to new text combinations without seeing that exact prompt before Rapidly adapt to generate completely novel objects with minimal fine-tuning However, limitations remain around flexibility, fine-grained details, and reliance on existing 2D guidance systems. TL;DR: HyperFields uses a dynamic hypernetwork to predict weights for a 3D generation network. The method is 5-10x faster than existing techniques and can quickly adapt to new text prompts, but has limitations in fine details. Full summary is here. Paper here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [P] Can you explain to me how to improve my project? (new to machine learning)
    Hi, so I'm new to machine learning, I did a project to predict the prices of a house or appartment (rent or sale) and I wanted to know what I could have done to improve my model? :D here is the github repo: https://github.com/bovealexandre/immo-eliza-train-test-alexandre/tree/Dev Thank you a lot in advance submitted by /u/spaceinter92 [link] [comments]  ( 9 min )
    [Discussion]About to begin my PhD in Multi-Modality AI, any suggestions?
    I am currently in my last year of undergrad, and about to begin my direct PhD in Multi-Modality AI next year. I have been in the community of deep learning & NLP about 2 years. And I have witnessed the development of Transformers, from the simple GPT&BERT to nowadays' billions of parameters' monsters with 'gold crown' on the top of deep learning world. I have spent a lot of time with T5 Model, and its paper(Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer, paper which I love very much!), trying to find an efficient way for usage of LLMs. I have handwritten Adapter Layer and LoRA to fine-tune T5 on Glue & SuperGlue. I have also tried out multiple fancy instruction fine-tuning LLMs like LLaMA and QWen. And this year earlier, I have noticed the wonder of Multi-Modality, and quickly fallen in love with it, which has now become my PhD focus. I have followed recent years' Multi-Modality development, especially CLIP and its follow-up works. And LLMs play quite an important role in today's vision-language model, say, BLIP2 and LLaVA. I believe due to the computational gap between schools and huge companies, the focus of my PhD career should be on Efficient Learning. I am also trying to enhance VLM through Retrieval-Augments. The target dataset may be Encyclopedic VQA, for even Large VLM failed to perform well on it, which could potentially be solved through Retrieval-Augmented VLM. I would like to hear any suggestions from you, including work-life balance, the direction of my academic focus and so on , which I would treasure very much in my new explorations of life stage. (I am currently doing RAG Question Answering Chat Bot for a company for my internship, and I would like to get suggestions from that too!) And are there subreddits like here?(I am also a member of LocalLLaMA, both subreddits benefits me a lot!) submitted by /u/Go2Heart [link] [comments]  ( 10 min )
    [D] NLP infrastructure
    I'm always willing to build a career in AI/ML infra. Usually when talking about AI infra in tech industry, we refer to training infra, serving infra, model deployment etc. Now with this genAI/LLM wave, I find many NLP specific infrastructure such as semantic indexing, vector databases are quickly rising up. So do semantic indexing/vector databases also count as AI infra? And is it a promising field? submitted by /u/Pitiful_Marketing733 [link] [comments]  ( 9 min )
    [D][R]What type of data streams(`im2col` matrix or regular conv) do commercial NPUs typically use for CNN? And where are `im2col` implemented, softwave(CPU) or HW accelerator for those situations where `im2col` is required?
    (I post this in several subreddit.) I'm gonna design an NN accelerator on FPGA. For NN, the basic operation is matrix-vector mult or matrix-matrix mult. And GEMM+im2col is easy to implement and many kinds of NN can be mapped on the designed accelerator based on GEMM+im2col easily. The disadvantage of it is that a little bit more bandwidth is required. I think it is a little tricky to design address-gen unit for the method of regular convolution when reading input feature map. So I want to know if GEMM+im2col is generally used in commercial NPUs or any type of accelerator(e.g. Microsoft Brainwave project on FPGA, they called it soft NPU), instead of regular convolution(seems generally used in academic paper). And if GEMM+im2col data stream is used, is im2col generally implemented on software(CPU) or designed in accelerator itself ? There is nothing about this in the paper of Microsoft Brainwave. All I know is HUAWEI Ascend designed im2col unit in the chip. By the way, is im2col finished on CPU when using pytorch+GPU? To be honest, my major is just digital IC, so I hardly coding Python to train model :( Thanks for your help!!! submitted by /u/ExcitingInternet6083 [link] [comments]  ( 9 min )
    [Project] LLM inference with vLLM and AMD: Achieving LLM inference parity with Nvidia
    I wanted to share some exciting news from the GPU world that could potentially change the game for LLM inference. AMD has been making significant strides in LLM inference, thanks to the porting of vLLM to ROCm 5.6. You can find the code implementation on GitHub. The result? AMD's MI210 now almost matches Nvidia's A100 in LLM inference performance. This is a significant development, as it could make AMD a more viable option for LLM inference tasks, which traditionally have been dominated by Nvidia. For those interested in the technical details, I recommend checking out this EmbeddedLLM Blog Post. I'm curious to hear your thoughts on this. Anyone manage to run it on RX 7900 XTX? https://preview.redd.it/rn7n29yxpuwb1.png?width=600&format=png&auto=webp&s=bdbac0d2b34d6f43a03503bbf72b446190248789 submitted by /u/openssp [link] [comments]  ( 9 min )
  • Open

    where re sources for chatGTP ?
    Hello can you help me ? all i know are https://chat.openai.com/ and https://platform.openai.com/playground ​ re there better sites to use? i m new to this and very comfused submitted by /u/proptuxiakoskariolis [link] [comments]  ( 8 min )
    Tool for calculating the sum of some values on a website
    Take this page on the DeFiLlama website: Protocol Treasuries, which contains a table with some financial data. I'm looking for an AI tool that can read the contents of this web page and then make some calculations. Specifically, I would like to give the tool this prompt: Calculate the sum of the values in the "Total Treasury" column I tried to use ChatGPT-4 with Bing, but it didn't work. Is there any tool that could be used here? submitted by /u/PaulRBerg [link] [comments]  ( 9 min )
    Microsoft's AI boost helped cloud business outpace rivals Amazon and Google
    Microsoft's cloud business outpaced rivals Amazon and Google in the third quarter, with accelerating growth driven by demand for artificial intelligence tools. Azure, Microsoft's cloud platform, reported 29% growth, faster than Google Cloud's 22% and more than double the pace of expansion at Amazon Web Services (AWS) at 12%. Microsoft's leadership position in AI projects and its partnership with OpenAI have contributed to its success. Analysts believe that Microsoft's results indicate it has taken the AI mantle from Google and that Azure could become a bigger hyperscale provider than AWS. Oracle, a new challenger in cloud computing, reported 66% growth in the August quarter. The cloud giants are still dealing with cost-saving initiatives from clients, which they call optimization. Source : https://www.cnbc.com/2023/10/27/microsoft-azure-outpaced-aws-and-google-cloud-in-latest-quarter.html submitted by /u/NuseAI [link] [comments]
    Pigeons solve problems the same way AI does, study says
    submitted by /u/thisisinsider [link] [comments]
    HyperFields: towards zero-shot NeRFs by mapping language to 3D geometry
    Generating 3D objects based solely on text descriptions has proven extremely challenging for AI. Current state-of-the-art methods require optimizing a full 3D model from scratch for each new prompt, which is computationally demanding. A new technique called HyperFields demonstrates promising progress in generating detailed 3D models directly from text prompts, without slow optimization. The HyperFields approach instead aims to learn a generalized mapping from language to 3D geometry representations. This would allow tailored 3D models to be produced for new text prompts efficiently in a single feedforward pass, without slow optimization. HyperFields combines two key techniques: A dynamic hypernetwork that takes in text and progressively predicts weights for a separate 3D generation network. The weight predictions are conditioned on previous layer activations, enabling specialization. Distilling individually optimized 3D networks into the hypernetwork, providing dense supervision for learning the complex text-to-3D mapping. In experiments, HyperFields exceeded previous state-of-the-art methods in sample efficiency and wall-clock convergence time by 5-10x. It demonstrated the ability to: Encode over 100 distinct objects like "yellow vase" in a single model Generalize to new text combinations without seeing that exact prompt before Rapidly adapt to generate completely novel objects with minimal fine-tuning However, limitations remain around flexibility, fine-grained details, and reliance on existing 2D guidance systems. TL;DR: HyperFields uses a dynamic hypernetwork to predict weights for a 3D generation network. The method is 5-10x faster than existing techniques and can quickly adapt to new text prompts, but has limitations in fine details. Full summary is here. Paper here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    Science as a superhuman recursively self improving problem solving system
    I'm watching this interview with Francois Chollet where he talks about science as an example of a superhuman recursively self improving problem solving system and how we can use it to reason about what a superhuman artificial general intelligence might be like. One thing I find interesting is his claim that the amount of resources we are investing into science is exponentially increasing but we are only making linear progress. If we assume this is true, i.e. that to continue making linear progress in science we need to invest exponentially increasing resources, doesn't it imply that eventually if we can't keep investing the exponentially increasing required resources to keep make linear progress that eventually we will start making worse than linear progress? Does this imply that in the very long term scientific progress is likely to slow down significantly? https://youtu.be/Bo8MY4JpiXE?t=836 submitted by /u/tail-recursion [link] [comments]  ( 9 min )
    Have a doctor explain to a patient that the diagnosis was made by an AI doctor twice as intelligent as, and vastly more knowledgeable than, the top human doctor in any medical specialty
    Certainly, let's imagine how this might unfold. The doctor sits down across from the patient, maintaining eye contact and a level of directness. "Look, your diagnosis came from an AI medical system, and this isn't just any AI. Imagine the best doctor in the world for your condition—now envision something twice as intelligent and far more knowledgeable. That's what we're working with here. This AI has a grasp on medical data and studies that no single human could ever fully comprehend. We're talking about millions of data points analyzed in a fraction of the time it would take any human expert." Why does that matter for you? It boosts the accuracy and thoroughness of your diagnosis. Human error, subjectivity, or oversight? Virtually eliminated. The AI provides a diagnosis that considers every potential variable, something that even the best human doctors could miss. "But don't worry, this isn't a replacement for human medical care. It's a complement. I'm here to interpret, apply this knowledge, and oversee your treatment in a way that a machine can't—because medicine isn't just about data, it's also about human experience, context, and care." So, you're getting the best of both worlds: unparalleled computational power for diagnosis, and human expertise for treatment. Trust me, you're in exceptionally good hands. CGPT-4 submitted by /u/Georgeo57 [link] [comments]
  • Open

    Deep Q-Learning to Actor-Critic using Robotics Simulations with Panda-Gym
    Please like,follow and share: Deep Q-Learning to Actor-Critic using Robotics Simulations with Panda-Gym https://medium.com/@andysingal/deep-q-learning-to-actor-critic-using-robotics-simulations-with-panda-gym-ff220f980366 submitted by /u/Fit_Maintenance_2455 [link] [comments]
    Created a video for beginners about what is reinforcement learning and how it can control agents in the virtual environment
    submitted by /u/Ecstatic-Ring3057 [link] [comments]
    How can I predict the next best action for a DDPG RL agent?
    I have a DDPG agent that takes a continuous observation and outputs a continuous action vector (see below). outputs = layers.Dense(self.action_size, activation="tanh", kernel_initializer=last_init)(x) An example action output looks as follows: [0.48011236 0.47933139] When my agent observes a terminal state action pair, I add it to a list of observations called terminal observations. I would like it so these actions get blocked in the future so there is no possible way for the agent to take them again. I understand that I could just add a large negative penalty, but I would like to ensure that the state action pair cannot be taken again. Evidently, I would like it so when I input my state and recieve an action back, if this pair is in terminal observations. I would like it so these actions get blocked in the future so there is no possible way for the agent to retake them. I understand that I could just add a large negative penalty, but I would like to ensure that the state action pair cannot be taken again. I understand that this won't change much in the agent's behaviour as instead of taking `[0.48011236 0.47933139]`, it might pick `[0.4900000 0.47933139]`. But I am unsure how to go about this, specifically selecting the next best action. submitted by /u/ArchNemesisPlays [link] [comments]
  • Open

    Thinking Fast and Thinking Slow: System 1 and System 2
    submitted by /u/Neurosymbolic [link] [comments]
    Geometric Data Analysis Explained
    submitted by /u/plutoandmal [link] [comments]
  • Open

    Certifying that a system of polynomial equations has no solution
    It’s usually easier to show that a problem has a solution than to show that it does not have a solution. Analogy with prime numbers Showing that a number is prime amounts to saying that the problem of finding nontrivial factors has no solution. How could you convince a skeptic that a large number N is […] Certifying that a system of polynomial equations has no solution first appeared on John D. Cook.  ( 6 min )
  • Open

    Can large language models replace humans in the systematic review process? Evaluating GPT-4's efficacy in screening and extracting data from peer-reviewed and grey literature in multiple languages. (arXiv:2310.17526v1 [cs.CL])
    Systematic reviews are vital for guiding practice, research, and policy, yet they are often slow and labour-intensive. Large language models (LLMs) could offer a way to speed up and automate systematic reviews, but their performance in such tasks has not been comprehensively evaluated against humans, and no study has tested GPT-4, the biggest LLM so far. This pre-registered study evaluates GPT-4's capability in title/abstract screening, full-text review, and data extraction across various literature types and languages using a 'human-out-of-the-loop' approach. Although GPT-4 had accuracy on par with human performance in most tasks, results were skewed by chance agreement and dataset imbalance. After adjusting for these, there was a moderate level of performance for data extraction, and - barring studies that used highly reliable prompts - screening performance levelled at none to moderate for different stages and languages. When screening full-text literature using highly reliable prompts, GPT-4's performance was 'almost perfect.' Penalising GPT-4 for missing key studies using highly reliable prompts improved its performance even more. Our findings indicate that, currently, substantial caution should be used if LLMs are being used to conduct systematic reviews, but suggest that, for certain systematic review tasks delivered under reliable prompts, LLMs can rival human performance.  ( 3 min )
    Unsupervised Domain Adaptation for Semantic Segmentation with Pseudo Label Self-Refinement. (arXiv:2310.16979v1 [cs.CV])
    Deep learning-based solutions for semantic segmentation suffer from significant performance degradation when tested on data with different characteristics than what was used during the training. Adapting the models using annotated data from the new domain is not always practical. Unsupervised Domain Adaptation (UDA) approaches are crucial in deploying these models in the actual operating conditions. Recent state-of-the-art (SOTA) UDA methods employ a teacher-student self-training approach, where a teacher model is used to generate pseudo-labels for the new data which in turn guide the training process of the student model. Though this approach has seen a lot of success, it suffers from the issue of noisy pseudo-labels being propagated in the training process. To address this issue, we propose an auxiliary pseudo-label refinement network (PRN) for online refining of the pseudo labels and also localizing the pixels whose predicted labels are likely to be noisy. Being able to improve the quality of pseudo labels and select highly reliable ones, PRN helps self-training of segmentation models to be robust against pseudo label noise propagation during different stages of adaptation. We evaluate our approach on benchmark datasets with three different domain shifts, and our approach consistently performs significantly better than the previous state-of-the-art methods.  ( 2 min )
    Model-Based Runtime Monitoring with Interactive Imitation Learning. (arXiv:2310.17552v1 [cs.RO])
    Robot learning methods have recently made great strides, but generalization and robustness challenges still hinder their widespread deployment. Failing to detect and address potential failures renders state-of-the-art learning systems not combat-ready for high-stakes tasks. Recent advances in interactive imitation learning have presented a promising framework for human-robot teaming, enabling the robots to operate safely and continually improve their performances over long-term deployments. Nonetheless, existing methods typically require constant human supervision and preemptive feedback, limiting their practicality in realistic domains. This work aims to endow a robot with the ability to monitor and detect errors during task execution. We introduce a model-based runtime monitoring algorithm that learns from deployment data to detect system anomalies and anticipate failures. Unlike prior work that cannot foresee future failures or requires failure experiences for training, our method learns a latent-space dynamics model and a failure classifier, enabling our method to simulate future action outcomes and detect out-of-distribution and high-risk states preemptively. We train our method within an interactive imitation learning framework, where it continually updates the model from the experiences of the human-robot team collected using trustworthy deployments. Consequently, our method reduces the human workload needed over time while ensuring reliable task execution. Our method outperforms the baselines across system-level and unit-test metrics, with 23% and 40% higher success rates in simulation and on physical hardware, respectively. More information at https://ut-austin-rpl.github.io/sirius-runtime-monitor/  ( 2 min )
    Faster Recalibration of an Online Predictor via Approachability. (arXiv:2310.17002v1 [cs.LG])
    Predictive models in ML need to be trustworthy and reliable, which often at the very least means outputting calibrated probabilities. This can be particularly difficult to guarantee in the online prediction setting when the outcome sequence can be generated adversarially. In this paper we introduce a technique using Blackwell's approachability theorem for taking an online predictive model which might not be calibrated and transforming its predictions to calibrated predictions without much increase to the loss of the original model. Our proposed algorithm achieves calibration and accuracy at a faster rate than existing techniques arXiv:1607.03594 and is the first algorithm to offer a flexible tradeoff between calibration error and accuracy in the online setting. We demonstrate this by characterizing the space of jointly achievable calibration and regret using our technique.  ( 2 min )
    MACP: Efficient Model Adaptation for Cooperative Perception. (arXiv:2310.16870v1 [cs.CV])
    Vehicle-to-vehicle (V2V) communications have greatly enhanced the perception capabilities of connected and automated vehicles (CAVs) by enabling information sharing to "see through the occlusions", resulting in significant performance improvements. However, developing and training complex multi-agent perception models from scratch can be expensive and unnecessary when existing single-agent models show remarkable generalization capabilities. In this paper, we propose a new framework termed MACP, which equips a single-agent pre-trained model with cooperation capabilities. We approach this objective by identifying the key challenges of shifting from single-agent to cooperative settings, adapting the model by freezing most of its parameters and adding a few lightweight modules. We demonstrate in our experiments that the proposed framework can effectively utilize cooperative observations and outperform other state-of-the-art approaches in both simulated and real-world cooperative perception benchmarks while requiring substantially fewer tunable parameters with reduced communication costs. Our source code is available at https://github.com/PurdueDigitalTwin/MACP.  ( 2 min )
    Neural Optimal Transport with General Cost Functionals. (arXiv:2205.15403v3 [cs.LG] UPDATED)
    We introduce a novel neural network-based algorithm to compute optimal transport (OT) plans for general cost functionals. In contrast to common Euclidean costs, i.e., $\ell^1$ or $\ell^2$, such functionals provide more flexibility and allow using auxiliary information, such as class labels, to construct the required transport map. Existing methods for general costs are discrete and have limitations in practice, i.e. they do not provide an out-of-sample estimation. We address the challenge of designing a continuous OT approach for general costs that generalizes to new data points in high-dimensional spaces, such as images. Additionally, we provide the theoretical error analysis for our recovered transport plans. As an application, we construct a cost functional to map data distributions while preserving the class-wise structure.  ( 2 min )
    Revisiting Deep Learning Models for Tabular Data. (arXiv:2106.11959v5 [cs.LG] UPDATED)
    The existing literature on deep learning for tabular data proposes a wide range of novel architectures and reports competitive results on various datasets. However, the proposed models are usually not properly compared to each other and existing works often use different benchmarks and experiment protocols. As a result, it is unclear for both researchers and practitioners what models perform best. Additionally, the field still lacks effective baselines, that is, the easy-to-use models that provide competitive performance across different problems. In this work, we perform an overview of the main families of DL architectures for tabular data and raise the bar of baselines in tabular DL by identifying two simple and powerful deep architectures. The first one is a ResNet-like architecture which turns out to be a strong baseline that is often missing in prior works. The second model is our simple adaptation of the Transformer architecture for tabular data, which outperforms other solutions on most tasks. Both models are compared to many existing architectures on a diverse set of tasks under the same training and tuning protocols. We also compare the best DL models with Gradient Boosted Decision Trees and conclude that there is still no universally superior solution.  ( 3 min )
    Online Estimation and Community Detection of Network Point Processes for Event Streams. (arXiv:2009.01742v3 [cs.SI] UPDATED)
    A common goal in network modeling is to uncover the latent community structure present among nodes. For many real-world networks, the true connections consist of events arriving as streams, which are then aggregated to form edges, ignoring the dynamic temporal component. A natural way to take account of these temporal dynamics of interactions is to use point processes as the foundation of network models for community detection. Computational complexity hampers the scalability of such approaches to large sparse networks. To circumvent this challenge, we propose a fast online variational inference algorithm for estimating the latent structure underlying dynamic event arrivals on a network, using continuous-time point process latent network models. We describe this procedure for networks models capturing community structure. This structure can be learned as new events are observed on the network, updating the inferred community assignments. We investigate the theoretical properties of such an inference scheme, and provide regret bounds on the loss function of this procedure. The proposed inference procedure is then thoroughly compared, using both simulation studies and real data, to non-online variants. We demonstrate that online inference can obtain comparable performance, in terms of community recovery, to non-online variants, while realising computational gains. Our proposed inference framework can also be readily modified to incorporate other popular network structures.  ( 3 min )
    Increasing Fairness via Combination with Learning Guarantees. (arXiv:2301.10813v3 [cs.LG] UPDATED)
    The concern about underlying discrimination hidden in machine learning (ML) models is increasing, as ML systems have been widely applied in more and more real-world scenarios and any discrimination hidden in them will directly affect human life. Many techniques have been developed to enhance fairness including commonly-used group fairness measures and several fairness-aware methods combining ensemble learning. However, existing fairness measures can only focus on one aspect -- either group or individual fairness, and the hard compatibility among them indicates a possibility of remaining biases even if one of them is satisfied. Moreover, existing mechanisms to boost fairness usually present empirical results to show validity, yet few of them discuss whether fairness can be boosted with certain theoretical guarantees. To address these issues, we propose a fairness quality measure named discriminative risk to reflect both individual and group fairness aspects. Furthermore, we investigate the properties of the proposed measure and propose first- and second-order oracle bounds to show that fairness can be boosted via ensemble combination with theoretical learning guarantees. The analysis is suitable for both binary and multi-class classification. A pruning method is also proposed to utilise our proposed measure and comprehensive experiments are conducted to evaluate the effectiveness of the proposed methods.  ( 3 min )
    Learning Transferable Adversarial Robust Representations via Multi-view Consistency. (arXiv:2210.10485v2 [cs.LG] UPDATED)
    Despite the success on few-shot learning problems, most meta-learned models only focus on achieving good performance on clean examples and thus easily break down when given adversarially perturbed samples. While some recent works have shown that a combination of adversarial learning and meta-learning could enhance the robustness of a meta-learner against adversarial attacks, they fail to achieve generalizable adversarial robustness to unseen domains and tasks, which is the ultimate goal of meta-learning. To address this challenge, we propose a novel meta-adversarial multi-view representation learning framework with dual encoders. Specifically, we introduce the discrepancy across the two differently augmented samples of the same data instance by first updating the encoder parameters with them and further imposing a novel label-free adversarial attack to maximize their discrepancy. Then, we maximize the consistency across the views to learn transferable robust representations across domains and tasks. Through experimental validation on multiple benchmarks, we demonstrate the effectiveness of our framework on few-shot learning tasks from unseen domains, achieving over 10\% robust accuracy improvements against previous adversarial meta-learning baselines.  ( 2 min )
    Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms. (arXiv:2302.13534v2 [cs.LG] UPDATED)
    We study the problem of designing adaptive multi-armed bandit algorithms that perform optimally in both the stochastic setting and the adversarial setting simultaneously (often known as a best-of-both-world guarantee). A line of recent works shows that when configured and analyzed properly, the Follow-the-Regularized-Leader (FTRL) algorithm, originally designed for the adversarial setting, can in fact optimally adapt to the stochastic setting as well. Such results, however, critically rely on an assumption that there exists one unique optimal arm. Recently, Ito (2021) took the first step to remove such an undesirable uniqueness assumption for one particular FTRL algorithm with the $\frac{1}{2}$-Tsallis entropy regularizer. In this work, we significantly improve and generalize this result, showing that uniqueness is unnecessary for FTRL with a broad family of regularizers and a new learning rate schedule. For some regularizers, our regret bounds also improve upon prior results even when uniqueness holds. We further provide an application of our results to the decoupled exploration and exploitation problem, demonstrating that our techniques are broadly applicable.  ( 3 min )
    Node-oriented Spectral Filtering for Graph Neural Networks. (arXiv:2212.03654v3 [cs.LG] UPDATED)
    Graph neural networks (GNNs) have shown remarkable performance on homophilic graph data while being far less impressive when handling non-homophilic graph data due to the inherent low-pass filtering property of GNNs. In general, since real-world graphs are often complex mixtures of diverse subgraph patterns, learning a universal spectral filter on the graph from the global perspective as in most current works may still suffer from great difficulty in adapting to the variation of local patterns. On the basis of the theoretical analysis of local patterns, we rethink the existing spectral filtering methods and propose the node-oriented spectral filtering for graph neural network (namely NFGNN). By estimating the node-oriented spectral filter for each node, NFGNN is provided with the capability of precise local node positioning via the generalized translated operator, thus discriminating the variations of local homophily patterns adaptively. Meanwhile, the utilization of re-parameterization brings a good trade-off between global consistency and local sensibility for learning the node-oriented spectral filters. Furthermore, we theoretically analyze the localization property of NFGNN, demonstrating that the signal after adaptive filtering is still positioned around the corresponding node. Extensive experimental results demonstrate that the proposed NFGNN achieves more favorable performance.  ( 3 min )
    Artificial intelligence in government: Concepts, standards, and a unified framework. (arXiv:2210.17218v2 [cs.CY] UPDATED)
    Recent advances in artificial intelligence (AI), especially in generative language modelling, hold the promise of transforming government. Given the advanced capabilities of new AI systems, it is critical that these are embedded using standard operational procedures, clear epistemic criteria, and behave in alignment with the normative expectations of society. Scholars in multiple domains have subsequently begun to conceptualize the different forms that AI applications may take, highlighting both their potential benefits and pitfalls. However, the literature remains fragmented, with researchers in social science disciplines like public administration and political science, and the fast-moving fields of AI, ML, and robotics, all developing concepts in relative isolation. Although there are calls to formalize the emerging study of AI in government, a balanced account that captures the full depth of theoretical perspectives needed to understand the consequences of embedding AI into a public sector context is lacking. Here, we unify efforts across social and technical disciplines by first conducting an integrative literature review to identify and cluster 69 key terms that frequently co-occur in the multidisciplinary study of AI. We then build on the results of this bibliometric analysis to propose three new multifaceted concepts for understanding and analysing AI-based systems for government (AI-GOV) in a more unified way: (1) operational fitness, (2) epistemic alignment, and (3) normative divergence. Finally, we put these concepts to work by using them as dimensions in a conceptual typology of AI-GOV and connecting each with emerging AI technical measurement standards to encourage operationalization, foster cross-disciplinary dialogue, and stimulate debate among those aiming to rethink government with AI.  ( 3 min )
    Explanations Based on Item Response Theory (eXirt): A Model-Specific Method to Explain Tree-Ensemble Model in Trust Perspective. (arXiv:2210.09933v2 [cs.LG] UPDATED)
    In recent years, XAI researchers have been formalizing proposals and developing new methods to explain black box models, with no general consensus in the community on which method to use to explain these models, with this choice being almost directly linked to the popularity of a specific method. Methods such as Ciu, Dalex, Eli5, Lofo, Shap and Skater emerged with the proposal to explain black box models through global rankings of feature relevance, which based on different methodologies, generate global explanations that indicate how the model's inputs explain its predictions. In this context, 41 datasets, 4 tree-ensemble algorithms (Light Gradient Boosting, CatBoost, Random Forest, and Gradient Boosting), and 6 XAI methods were used to support the launch of a new XAI method, called eXirt, based on Item Response Theory - IRT and aimed at tree-ensemble black box models that use tabular data referring to binary classification problems. In the first set of analyses, the 164 global feature relevance ranks of the eXirt were compared with 984 ranks of the other XAI methods present in the literature, seeking to highlight their similarities and differences. In a second analysis, exclusive explanations of the eXirt based on Explanation-by-example were presented that help in understanding the model trust. Thus, it was verified that eXirt is able to generate global explanations of tree-ensemble models and also local explanations of instances of models through IRT, showing how this consolidated theory can be used in machine learning in order to obtain explainable and reliable models.  ( 3 min )
    Towards Better Generalization with Flexible Representation of Multi-Module Graph Neural Networks. (arXiv:2209.06589v4 [cs.LG] UPDATED)
    Graph neural networks (GNNs) have become compelling models designed to perform learning and inference on graph-structured data. However, little work has been done to understand the fundamental limitations of GNNs for scaling to larger graphs and generalizing to out-of-distribution (OOD) inputs. In this paper, we use a random graph generator to systematically investigate how the graph size and structural properties affect the predictive performance of GNNs. We present specific evidence that the average node degree is a key feature in determining whether GNNs can generalize to unseen graphs, and that the use of multiple node update functions can improve the generalization performance of GNNs when dealing with graphs of multimodal degree distributions. Accordingly, we propose a multi-module GNN framework that allows the network to adapt flexibly to new graphs by generalizing a single canonical nonlinear transformation over aggregated inputs. Our results show that the multi-module GNNs improve the OOD generalization on a variety of inference tasks in the direction of diverse structural features.  ( 2 min )
    Out-of-Distribution Detection in Time-Series Domain: A Novel Seasonal Ratio Scoring Approach. (arXiv:2207.04306v3 [cs.LG] UPDATED)
    Safe deployment of time-series classifiers for real-world applications relies on the ability to detect the data which is not generated from the same distribution as training data. This task is referred to as out-of-distribution (OOD) detection. We consider the novel problem of OOD detection for the time-series domain. We discuss the unique challenges posed by time-series data and explain why prior methods from the image domain will perform poorly. Motivated by these challenges, this paper proposes a novel {\em Seasonal Ratio Scoring (SRS)} approach. SRS consists of three key algorithmic steps. First, each input is decomposed into class-wise semantic component and remainder. Second, this decomposition is employed to estimate the class-wise conditional likelihoods of the input and remainder using deep generative models. The seasonal ratio score is computed from these estimates. Third, a threshold interval is identified from the in-distribution data to detect OOD examples. Experiments on diverse real-world benchmarks demonstrate that the SRS method is well-suited for time-series OOD detection when compared to baseline methods. Open-source code for SRS method is provided at https://github.com/tahabelkhouja/SRS  ( 3 min )
    On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms. (arXiv:2206.05869v2 [cs.LG] UPDATED)
    Stochastic gradient descent (SGD) algorithm is the method of choice in many machine learning tasks thanks to its scalability and efficiency in dealing with large-scale problems. In this paper, we focus on the shuffling version of SGD which matches the mainstream practical heuristics. We show the convergence to a global solution of shuffling SGD for a class of non-convex functions under over-parameterized settings. Our analysis employs more relaxed non-convex assumptions than previous literature. Nevertheless, we maintain the desired computational complexity as shuffling SGD has achieved in the general convex setting.  ( 2 min )
    Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits. (arXiv:2107.11419v2 [stat.ML] UPDATED)
    We consider nonstationary multi-armed bandit problems where the model parameters of the arms change over time. We introduce the adaptive resetting bandit (ADR-bandit), a bandit algorithm class that leverages adaptive windowing techniques from literature on data streams. We first provide new guarantees on the quality of estimators resulting from adaptive windowing techniques, which are of independent interest. Furthermore, we conduct a finite-time analysis of ADR-bandit in two typical environments: an abrupt environment where changes occur instantaneously and a gradual environment where changes occur progressively. We demonstrate that ADR-bandit has nearly optimal performance when abrupt or gradual changes occur in a coordinated manner that we call global changes. We demonstrate that forced exploration is unnecessary when we assume such global changes. Unlike the existing nonstationary bandit algorithms, ADR-bandit has optimal performance in stationary environments as well as nonstationary environments with global changes. Our experiments show that the proposed algorithms outperform the existing approaches in synthetic and real-world environments.  ( 2 min )
    Characterizing the Implicit Bias of Regularized SGD in Rank Minimization. (arXiv:2206.05794v6 [cs.LG] UPDATED)
    We study the bias of Stochastic Gradient Descent (SGD) to learn low-rank weight matrices when training deep neural networks. Our results show that training neural networks with mini-batch SGD and weight decay causes a bias towards rank minimization over the weight matrices. Specifically, we show, both theoretically and empirically, that this bias is more pronounced when using smaller batch sizes, higher learning rates, or increased weight decay. Additionally, we predict and observe empirically that weight decay is necessary to achieve this bias. Unlike previous literature, our analysis does not rely on assumptions about the data, convergence, or optimality of the weight matrices and applies to a wide range of neural network architectures of any width or depth. Finally, we empirically investigate the connection between this bias and generalization, finding that it has a marginal effect on generalization.  ( 2 min )
    No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions. (arXiv:2305.17380v3 [cs.LG] UPDATED)
    Existing online learning algorithms for adversarial Markov Decision Processes achieve ${O}(\sqrt{T})$ regret after $T$ rounds of interactions even if the loss functions are chosen arbitrarily by an adversary, with the caveat that the transition function has to be fixed. This is because it has been shown that adversarial transition functions make no-regret learning impossible. Despite such impossibility results, in this work, we develop algorithms that can handle both adversarial losses and adversarial transitions, with regret increasing smoothly in the degree of maliciousness of the adversary. More concretely, we first propose an algorithm that enjoys $\widetilde{{O}}(\sqrt{T} + C^{\textsf{P}})$ regret where $C^{\textsf{P}}$ measures how adversarial the transition functions are and can be at most ${O}(T)$. While this algorithm itself requires knowledge of $C^{\textsf{P}}$, we further develop a black-box reduction approach that removes this requirement. Moreover, we also show that further refinements of the algorithm not only maintains the same regret bound, but also simultaneously adapts to easier environments (where losses are generated in a certain stochastically constrained manner as in Jin et al. [2021]) and achieves $\widetilde{{O}}(U + \sqrt{UC^{\textsf{L}}} + C^{\textsf{P}})$ regret, where $U$ is some standard gap-dependent coefficient and $C^{\textsf{L}}$ is the amount of corruption on losses.
    COPF: Continual Learning Human Preference through Optimal Policy Fitting. (arXiv:2310.15694v3 [cs.LG] UPDATED)
    The technique of Reinforcement Learning from Human Feedback (RLHF) is a commonly employed method to improve pre-trained Language Models (LM), enhancing their ability to conform to human preferences. Nevertheless, the current RLHF-based LMs necessitate full retraining each time novel queries or feedback are introduced, which becomes a challenging task because human preferences can vary between different domains or tasks. Retraining LMs poses practical difficulties in many real-world situations due to the significant time and computational resources required, along with concerns related to data privacy. To address this limitation, we propose a new method called Continual Optimal Policy Fitting (COPF), in which we estimate a series of optimal policies using the Monte Carlo method, and then continually fit the policy sequence with the function regularization. COPF involves a single learning phase and doesn't necessitate complex reinforcement learning. Importantly, it shares the capability with RLHF to learn from unlabeled data, making it flexible for continual preference learning. Our experimental results show that COPF outperforms strong Continuous learning (CL) baselines when it comes to consistently aligning with human preferences on different tasks and domains.
    Spontaneous Symmetry Breaking in Generative Diffusion Models. (arXiv:2305.19693v3 [cs.LG] UPDATED)
    Generative diffusion models have recently emerged as a leading approach for generating high-dimensional data. In this paper, we show that the dynamics of these models exhibit a spontaneous symmetry breaking that divides the generative dynamics into two distinct phases: 1) A linear steady-state dynamics around a central fixed-point and 2) an attractor dynamics directed towards the data manifold. These two "phases" are separated by the change in stability of the central fixed-point, with the resulting window of instability being responsible for the diversity of the generated samples. Using both theoretical and empirical evidence, we show that an accurate simulation of the early dynamics does not significantly contribute to the final generation, since early fluctuations are reverted to the central fixed point. To leverage this insight, we propose a Gaussian late initialization scheme, which significantly improves model performance, achieving up to 3x FID improvements on fast samplers, while also increasing sample diversity (e.g., racial composition of generated CelebA images). Our work offers a new way to understand the generative dynamics of diffusion models that has the potential to bring about higher performance and less biased fast-samplers.
    Scaling Data-Constrained Language Models. (arXiv:2305.16264v4 [cs.CL] UPDATED)
    The current trend of scaling language models involves increasing both parameter count and training dataset size. Extrapolating this trend suggests that training dataset size may soon be limited by the amount of text data available on the internet. Motivated by this limit, we investigate scaling language models in data-constrained regimes. Specifically, we run a large set of experiments varying the extent of data repetition and compute budget, ranging up to 900 billion training tokens and 9 billion parameter models. We find that with constrained data for a fixed compute budget, training with up to 4 epochs of repeated data yields negligible changes to loss compared to having unique data. However, with more repetition, the value of adding compute eventually decays to zero. We propose and empirically validate a scaling law for compute optimality that accounts for the decreasing value of repeated tokens and excess parameters. Finally, we experiment with approaches mitigating data scarcity, including augmenting the training dataset with code data or removing commonly used filters. Models and datasets from our 400 training runs are freely available at https://github.com/huggingface/datablations.
    A weighted-variance variational autoencoder model for speech enhancement. (arXiv:2211.00990v2 [cs.SD] CROSS LISTED)
    We address speech enhancement based on variational autoencoders, which involves learning a speech prior distribution in the time-frequency (TF) domain. A zero-mean complex-valued Gaussian distribution is usually assumed for the generative model, where the speech information is encoded in the variance as a function of a latent variable. In contrast to this commonly used approach, we propose a weighted variance generative model, where the contribution of each spectrogram time-frame in parameter learning is weighted. We impose a Gamma prior distribution on the weights, which would effectively lead to a Student's t-distribution instead of Gaussian for speech generative modeling. We develop efficient training and speech enhancement algorithms based on the proposed generative model. Our experimental results on spectrogram auto-encoding and speech enhancement demonstrate the effectiveness and robustness of the proposed approach compared to the standard unweighted variance model.
    SACSoN: Scalable Autonomous Control for Social Navigation. (arXiv:2306.01874v3 [cs.RO] UPDATED)
    Machine learning provides a powerful tool for building socially compliant robotic systems that go beyond simple predictive models of human behavior. By observing and understanding human interactions from past experiences, learning can enable effective social navigation behaviors directly from data. In this paper, our goal is to develop methods for training policies for socially unobtrusive navigation, such that robots can navigate among humans in ways that don't disturb human behavior. We introduce a definition for such behavior based on the counterfactual perturbation of the human: if the robot had not intruded into the space, would the human have acted in the same way? By minimizing this counterfactual perturbation, we can induce robots to behave in ways that do not alter the natural behavior of humans in the shared space. Instantiating this principle requires training policies to minimize their effect on human behavior, and this in turn requires data that allows us to model the behavior of humans in the presence of robots. Therefore, our approach is based on two key contributions. First, we collect a large dataset where an indoor mobile robot interacts with human bystanders. Second, we utilize this dataset to train policies that minimize counterfactual perturbation. We provide supplementary videos and make publicly available the largest-of-its-kind visual navigation dataset on our project page.
    Finding Regions of Counterfactual Explanations via Robust Optimization. (arXiv:2301.11113v3 [cs.LG] UPDATED)
    Counterfactual explanations play an important role in detecting bias and improving the explainability of data-driven classification models. A counterfactual explanation (CE) is a minimal perturbed data point for which the decision of the model changes. Most of the existing methods can only provide one CE, which may not be achievable for the user. In this work we derive an iterative method to calculate robust CEs, i.e. CEs that remain valid even after the features are slightly perturbed. To this end, our method provides a whole region of CEs allowing the user to choose a suitable recourse to obtain a desired outcome. We use algorithmic ideas from robust optimization and prove convergence results for the most common machine learning methods including logistic regression, decision trees, random forests, and neural networks. Our experiments show that our method can efficiently generate globally optimal robust CEs for a variety of common data sets and classification models.
    Control3Diff: Learning Controllable 3D Diffusion Models from Single-view Images. (arXiv:2304.06700v2 [cs.CV] UPDATED)
    Diffusion models have recently become the de-facto approach for generative modeling in the 2D domain. However, extending diffusion models to 3D is challenging due to the difficulties in acquiring 3D ground truth data for training. On the other hand, 3D GANs that integrate implicit 3D representations into GANs have shown remarkable 3D-aware generation when trained only on single-view image datasets. However, 3D GANs do not provide straightforward ways to precisely control image synthesis. To address these challenges, We present Control3Diff, a 3D diffusion model that combines the strengths of diffusion models and 3D GANs for versatile, controllable 3D-aware image synthesis for single-view datasets. Control3Diff explicitly models the underlying latent distribution (optionally conditioned on external inputs), thus enabling direct control during the diffusion process. Moreover, our approach is general and applicable to any type of controlling input, allowing us to train it with the same diffusion objective without any auxiliary supervision. We validate the efficacy of Control3Diff on standard image generation benchmarks, including FFHQ, AFHQ, and ShapeNet, using various conditioning inputs such as images, sketches, and text prompts. Please see the project website (\url{https://jiataogu.me/control3diff}) for video comparisons.
    Robust Covariate Shift Adaptation for Density-Ratio Estimation. (arXiv:2310.16638v2 [stat.ME] UPDATED)
    Consider a scenario where we have access to train data with both covariates and outcomes while test data only contains covariates. In this scenario, our primary aim is to predict the missing outcomes of the test data. With this objective in mind, we train parametric regression models under a covariate shift, where covariate distributions are different between the train and test data. For this problem, existing studies have proposed covariate shift adaptation via importance weighting using the density ratio. This approach averages the train data losses, each weighted by an estimated ratio of the covariate densities between the train and test data, to approximate the test-data risk. Although it allows us to obtain a test-data risk minimizer, its performance heavily relies on the accuracy of the density ratio estimation. Moreover, even if the density ratio can be consistently estimated, the estimation errors of the density ratio also yield bias in the estimators of the regression model's parameters of interest. To mitigate these challenges, we introduce a doubly robust estimator for covariate shift adaptation via importance weighting, which incorporates an additional estimator for the regression function. Leveraging double machine learning techniques, our estimator reduces the bias arising from the density ratio estimation errors. We demonstrate the asymptotic distribution of the regression parameter estimator. Notably, our estimator remains consistent if either the density ratio estimator or the regression function is consistent, showcasing its robustness against potential errors in density ratio estimation. Finally, we confirm the soundness of our proposed method via simulation studies.
    Generalizing to new geometries with Geometry-Aware Autoregressive Models (GAAMs) for fast calorimeter simulation. (arXiv:2305.11531v3 [physics.ins-det] UPDATED)
    Generation of simulated detector response to collision products is crucial to data analysis in particle physics, but computationally very expensive. One subdetector, the calorimeter, dominates the computational time due to the high granularity of its cells and complexity of the interactions. Generative models can provide more rapid sample production, but currently require significant effort to optimize performance for specific detector geometries, often requiring many models to describe the varying cell sizes and arrangements, without the ability to generalize to other geometries. We develop a $\textit{geometry-aware}$ autoregressive model, which learns how the calorimeter response varies with geometry, and is capable of generating simulated responses to unseen geometries without additional training. The geometry-aware model outperforms a baseline unaware model by over $50\%$ in several metrics such as the Wasserstein distance between the generated and the true distributions of key quantities which summarize the simulated response. A single geometry-aware model could replace the hundreds of generative models currently designed for calorimeter simulation by physicists analyzing data collected at the Large Hadron Collider. This proof-of-concept study motivates the design of a foundational model that will be a crucial tool for the study of future detectors, dramatically reducing the large upfront investment usually needed to develop generative calorimeter models.
    Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning. (arXiv:2301.12593v2 [cs.LG] UPDATED)
    Many real-world domains require safe decision making in uncertain environments. In this work, we introduce a deep reinforcement learning framework for approaching this important problem. We consider a distribution over transition models, and apply a risk-averse perspective towards model uncertainty through the use of coherent distortion risk measures. We provide robustness guarantees for this framework by showing it is equivalent to a specific class of distributionally robust safe reinforcement learning problems. Unlike existing approaches to robustness in deep reinforcement learning, however, our formulation does not involve minimax optimization. This leads to an efficient, model-free implementation of our approach that only requires standard data collection from a single training environment. In experiments on continuous control tasks with safety constraints, we demonstrate that our framework produces robust performance and safety at deployment time across a range of perturbed test environments.
    Efficient Sensor Placement from Regression with Sparse Gaussian Processes in Continuous and Discrete Spaces. (arXiv:2303.00028v6 [cs.RO] UPDATED)
    The sensor placement problem is a common problem that arises when monitoring correlated phenomena, such as temperature, precipitation, and salinity. Existing approaches to this problem typically formulate it as the maximization of information metrics, such as mutual information~(MI), and use optimization methods such as greedy algorithms in discrete domains, and derivative-free optimization methods such as genetic algorithms in continuous domains. However, computing MI for sensor placement requires discretizing the environment, and its computation cost depends on the size of the discretized environment. This limitation restricts these approaches from scaling to large problems. We have uncovered a novel connection between the sensor placement problem and sparse Gaussian processes~(SGP). Our approach leverages SGPs and is gradient-based, which allows us to efficiently find solution placements in continuous environments. We generalize our method to also handle discrete environments. Our experimental results on four real-world datasets demonstrate that our approach generates sensor placements consistently on par with or better than the prior state-of-the-art approaches in terms of both MI and reconstruction quality, all while being significantly faster. Our computationally efficient approach enables both large-scale sensor placement and fast robotic sensor placement for informative path planning algorithms.
    Convolutional Visual Prompt for Robust Visual Perception. (arXiv:2303.00198v2 [cs.CV] UPDATED)
    Vision models are often vulnerable to out-of-distribution (OOD) samples without adapting. While visual prompts offer a lightweight method of input-space adaptation for large-scale vision models, they rely on a high-dimensional additive vector and labeled data. This leads to overfitting when adapting models in a self-supervised test-time setting without labels. We introduce convolutional visual prompts (CVP) for label-free test-time adaptation for robust visual perception. The structured nature of CVP demands fewer trainable parameters, less than 1\% compared to standard visual prompts, combating overfitting. Extensive experiments and analysis on a wide variety of OOD visual perception tasks show that our approach is effective, improving robustness by up to 5.87% over several large-scale models.
    Variance Reduced Halpern Iteration for Finite-Sum Monotone Inclusions. (arXiv:2310.02987v2 [cs.LG] UPDATED)
    Machine learning approaches relying on such criteria as adversarial robustness or multi-agent settings have raised the need for solving game-theoretic equilibrium problems. Of particular relevance to these applications are methods targeting finite-sum structure, which generically arises in empirical variants of learning problems in these contexts. Further, methods with computable approximation errors are highly desirable, as they provide verifiable exit criteria. Motivated by these applications, we study finite-sum monotone inclusion problems, which model broad classes of equilibrium problems. Our main contributions are variants of the classical Halpern iteration that employ variance reduction to obtain improved complexity guarantees in which $n$ component operators in the finite sum are ``on average'' either cocoercive or Lipschitz continuous and monotone, with parameter $L$. The resulting oracle complexity of our methods, which provide guarantees for the last iterate and for a (computable) operator norm residual, is $\widetilde{\mathcal{O}}( n + \sqrt{n}L\varepsilon^{-1})$, which improves upon existing methods by a factor up to $\sqrt{n}$. This constitutes the first variance reduction-type result for general finite-sum monotone inclusions and for more specific problems such as convex-concave optimization when operator norm residual is the optimality measure. We further argue that, up to poly-logarithmic factors, this complexity is unimprovable in the monotone Lipschitz setting; i.e., the provided result is near-optimal.
    NLI4CT: Multi-Evidence Natural Language Inference for Clinical Trial Reports. (arXiv:2305.03598v2 [cs.CL] UPDATED)
    How can we interpret and retrieve medical evidence to support clinical decisions? Clinical trial reports (CTR) amassed over the years contain indispensable information for the development of personalized medicine. However, it is practically infeasible to manually inspect over 400,000+ clinical trial reports in order to find the best evidence for experimental treatments. Natural Language Inference (NLI) offers a potential solution to this problem, by allowing the scalable computation of textual entailment. However, existing NLI models perform poorly on biomedical corpora, and previously published datasets fail to capture the full complexity of inference over CTRs. In this work, we present a novel resource to advance research on NLI for reasoning on CTRs. The resource includes two main tasks. Firstly, to determine the inference relation between a natural language statement, and a CTR. Secondly, to retrieve supporting facts to justify the predicted relation. We provide NLI4CT, a corpus of 2400 statements and CTRs, annotated for these tasks. Baselines on this corpus expose the limitations of existing NLI models, with 6 state-of-the-art NLI models achieving a maximum F1 score of 0.627. To the best of our knowledge, we are the first to design a task that covers the interpretation of full CTRs. To encourage further work on this challenging dataset, we make the corpus, competition leaderboard, website and code to replicate the baseline experiments available at: https://github.com/ai-systems/nli4ct
    Unifying GANs and Score-Based Diffusion as Generative Particle Models. (arXiv:2305.16150v2 [cs.LG] UPDATED)
    Particle-based deep generative models, such as gradient flows and score-based diffusion models, have recently gained traction thanks to their striking performance. Their principle of displacing particle distributions using differential equations is conventionally seen as opposed to the previously widespread generative adversarial networks (GANs), which involve training a pushforward generator network. In this paper we challenge this interpretation, and propose a novel framework that unifies particle and adversarial generative models by framing generator training as a generalization of particle models. This suggests that a generator is an optional addition to any such generative model. Consequently, integrating a generator into a score-based diffusion model and training a GAN without a generator naturally emerge from our framework. We empirically test the viability of these original models as proofs of concepts of potential applications of our framework.
    SpatialRank: Urban Event Ranking with NDCG Optimization on Spatiotemporal Data. (arXiv:2310.00270v5 [cs.LG] UPDATED)
    The problem of urban event ranking aims at predicting the top-k most risky locations of future events such as traffic accidents and crimes. This problem is of fundamental importance to public safety and urban administration especially when limited resources are available. The problem is, however, challenging due to complex and dynamic spatio-temporal correlations between locations, uneven distribution of urban events in space, and the difficulty to correctly rank nearby locations with similar features. Prior works on event forecasting mostly aim at accurately predicting the actual risk score or counts of events for all the locations. Rankings obtained as such usually have low quality due to prediction errors. Learning-to-rank methods directly optimize measures such as Normalized Discounted Cumulative Gain (NDCG), but cannot handle the spatiotemporal autocorrelation existing among locations. In this paper, we bridge the gap by proposing a novel spatial event ranking approach named SpatialRank. SpatialRank features adaptive graph convolution layers that dynamically learn the spatiotemporal dependencies across locations from data. In addition, the model optimizes through surrogates a hybrid NDCG loss with a spatial component to better rank neighboring spatial locations. We design an importance-sampling with a spatial filtering algorithm to effectively evaluate the loss during training. Comprehensive experiments on three real-world datasets demonstrate that SpatialRank can effectively identify the top riskiest locations of crimes and traffic accidents and outperform state-of-art methods in terms of NDCG by up to 12.7%.
    Hierarchical clustering with OWA-based linkages, the Lance-Williams formula, and dendrogram inversions. (arXiv:2303.05683v2 [stat.ML] UPDATED)
    Agglomerative hierarchical clustering based on Ordered Weighted Averaging (OWA) operators not only generalises the single, complete, and average linkages, but also includes intercluster distances based on a few nearest or farthest neighbours, trimmed and winsorised means of pairwise point similarities, amongst many others. We explore the relationships between the famous Lance-Williams update formula and the extended OWA-based linkages with weights generated via infinite coefficient sequences. Furthermore, we provide some conditions for the weight generators to guarantee the resulting dendrograms to be free from unaesthetic inversions.
    Leave-one-out Distinguishability in Machine Learning. (arXiv:2309.17310v3 [cs.LG] UPDATED)
    We introduce a new analytical framework to quantify the changes in a machine learning algorithm's output distribution following the inclusion of a few data points in its training set, a notion we define as leave-one-out distinguishability (LOOD). This problem is key to measuring data **memorization** and **information leakage** in machine learning, and the **influence** of training data points on model predictions. We illustrate how our method broadens and refines existing empirical measures of memorization and privacy risks associated with training data. We use Gaussian processes to model the randomness of machine learning algorithms, and validate LOOD with extensive empirical analysis of information leakage using membership inference attacks. Our theoretical framework enables us to investigate the causes of information leakage and where the leakage is high. For example, we analyze the influence of activation functions, on data memorization. Additionally, our method allows us to optimize queries that disclose the most significant information about the training data in the leave-one-out setting. We illustrate how optimal queries can be used for accurate **reconstruction** of training data.
    Adaptive whitening with fast gain modulation and slow synaptic plasticity. (arXiv:2308.13633v2 [q-bio.NC] UPDATED)
    Neurons in early sensory areas rapidly adapt to changing sensory statistics, both by normalizing the variance of their individual responses and by reducing correlations between their responses. Together, these transformations may be viewed as an adaptive form of statistical whitening. Existing mechanistic models of adaptive whitening exclusively use either synaptic plasticity or gain modulation as the biological substrate for adaptation; however, on their own, each of these models has significant limitations. In this work, we unify these approaches in a normative multi-timescale mechanistic model that adaptively whitens its responses with complementary computational roles for synaptic plasticity and gain modulation. Gains are modified on a fast timescale to adapt to the current statistical context, whereas synapses are modified on a slow timescale to match structural properties of the input statistics that are invariant across contexts. Our model is derived from a novel multi-timescale whitening objective that factorizes the inverse whitening matrix into basis vectors, which correspond to synaptic weights, and a diagonal matrix, which corresponds to neuronal gains. We test our model on synthetic and natural datasets and find that the synapses learn optimal configurations over long timescales that enable adaptive whitening on short timescales using gain modulation.
    Efficient Adversarial Contrastive Learning via Robustness-Aware Coreset Selection. (arXiv:2302.03857v5 [cs.LG] UPDATED)
    Adversarial contrastive learning (ACL) does not require expensive data annotations but outputs a robust representation that withstands adversarial attacks and also generalizes to a wide range of downstream tasks. However, ACL needs tremendous running time to generate the adversarial variants of all training data, which limits its scalability to large datasets. To speed up ACL, this paper proposes a robustness-aware coreset selection (RCS) method. RCS does not require label information and searches for an informative subset that minimizes a representational divergence, which is the distance of the representation between natural data and their virtual adversarial variants. The vanilla solution of RCS via traversing all possible subsets is computationally prohibitive. Therefore, we theoretically transform RCS into a surrogate problem of submodular maximization, of which the greedy search is an efficient solution with an optimality guarantee for the original problem. Empirically, our comprehensive results corroborate that RCS can speed up ACL by a large margin without significantly hurting the robustness transferability. Notably, to the best of our knowledge, we are the first to conduct ACL efficiently on the large-scale ImageNet-1K dataset to obtain an effective robust representation via RCS. Our source code is at https://github.com/GodXuxilie/Efficient_ACL_via_RCS.
    Diversified Outlier Exposure for Out-of-Distribution Detection via Informative Extrapolation. (arXiv:2310.13923v2 [cs.LG] UPDATED)
    Out-of-distribution (OOD) detection is important for deploying reliable machine learning models on real-world applications. Recent advances in outlier exposure have shown promising results on OOD detection via fine-tuning model with informatively sampled auxiliary outliers. However, previous methods assume that the collected outliers can be sufficiently large and representative to cover the boundary between ID and OOD data, which might be impractical and challenging. In this work, we propose a novel framework, namely, Diversified Outlier Exposure (DivOE), for effective OOD detection via informative extrapolation based on the given auxiliary outliers. Specifically, DivOE introduces a new learning objective, which diversifies the auxiliary distribution by explicitly synthesizing more informative outliers for extrapolation during training. It leverages a multi-step optimization method to generate novel outliers beyond the original ones, which is compatible with many variants of outlier exposure. Extensive experiments and analyses have been conducted to characterize and demonstrate the effectiveness of the proposed DivOE. The code is publicly available at: https://github.com/tmlr-group/DivOE.
    Monte Carlo guided Diffusion for Bayesian linear inverse problems. (arXiv:2308.07983v2 [stat.ML] UPDATED)
    Ill-posed linear inverse problems arise frequently in various applications, from computational photography to medical imaging. A recent line of research exploits Bayesian inference with informative priors to handle the ill-posedness of such problems. Amongst such priors, score-based generative models (SGM) have recently been successfully applied to several different inverse problems. In this study, we exploit the particular structure of the prior defined by the SGM to define a sequence of intermediate linear inverse problems. As the noise level decreases, the posteriors of these inverse problems get closer to the target posterior of the original inverse problem. To sample from this sequence of posteriors, we propose the use of Sequential Monte Carlo (SMC) methods. The proposed algorithm, MCGDiff, is shown to be theoretically grounded and we provide numerical simulations showing that it outperforms competing baselines when dealing with ill-posed inverse problems in a Bayesian setting.
    EqDrive: Efficient Equivariant Motion Forecasting with Multi-Modality for Autonomous Driving. (arXiv:2310.17540v1 [cs.RO])
    Forecasting vehicular motions in autonomous driving requires a deep understanding of agent interactions and the preservation of motion equivariance under Euclidean geometric transformations. Traditional models often lack the sophistication needed to handle the intricate dynamics inherent to autonomous vehicles and the interaction relationships among agents in the scene. As a result, these models have a lower model capacity, which then leads to higher prediction errors and lower training efficiency. In our research, we employ EqMotion, a leading equivariant particle, and human prediction model that also accounts for invariant agent interactions, for the task of multi-agent vehicle motion forecasting. In addition, we use a multi-modal prediction mechanism to account for multiple possible future paths in a probabilistic manner. By leveraging EqMotion, our model achieves state-of-the-art (SOTA) performance with fewer parameters (1.2 million) and a significantly reduced training time (less than 2 hours).
    On Embeddings for Numerical Features in Tabular Deep Learning. (arXiv:2203.05556v4 [cs.LG] UPDATED)
    Recently, Transformer-like deep architectures have shown strong performance on tabular data problems. Unlike traditional models, e.g., MLP, these architectures map scalar values of numerical features to high-dimensional embeddings before mixing them in the main backbone. In this work, we argue that embeddings for numerical features are an underexplored degree of freedom in tabular DL, which allows constructing more powerful DL models and competing with GBDT on some traditionally GBDT-friendly benchmarks. We start by describing two conceptually different approaches to building embedding modules: the first one is based on a piecewise linear encoding of scalar values, and the second one utilizes periodic activations. Then, we empirically demonstrate that these two approaches can lead to significant performance boosts compared to the embeddings based on conventional blocks such as linear layers and ReLU activations. Importantly, we also show that embedding numerical features is beneficial for many backbones, not only for Transformers. Specifically, after proper embeddings, simple MLP-like models can perform on par with the attention-based architectures. Overall, we highlight embeddings for numerical features as an important design aspect with good potential for further improvements in tabular DL.
    Optimal Scoring Rule Design under Partial Knowledge. (arXiv:2107.07420v2 [cs.GT] UPDATED)
    This paper studies the design of optimal proper scoring rules when the principal has partial knowledge of an agent's signal distribution. Recent work characterizes the proper scoring rules that maximize the increase of an agent's payoff when the agent chooses to access a costly signal to refine a posterior belief from her prior prediction, under the assumption that the agent's signal distribution is fully known to the principal. In our setting, the principal only knows about a set of distributions where the agent's signal distribution belongs. We formulate the scoring rule design problem as a max-min optimization that maximizes the worst-case increase in payoff across the set of distributions. We propose an efficient algorithm to compute an optimal scoring rule when the set of distributions is finite, and devise a fully polynomial-time approximation scheme that accommodates various infinite sets of distributions. We further remark that widely used scoring rules, such as the quadratic and log rules, as well as previously identified optimal scoring rules under full knowledge, can be far from optimal in our partial knowledge settings.
    Towards Unifying Diffusion Models for Probabilistic Spatio-Temporal Graph Learning. (arXiv:2310.17360v1 [cs.LG])
    Spatio-temporal graph learning is a fundamental problem in the Web of Things era, which enables a plethora of Web applications such as smart cities, human mobility and climate analysis. Existing approaches tackle different learning tasks independently, tailoring their models to unique task characteristics. These methods, however, fall short of modeling intrinsic uncertainties in the spatio-temporal data. Meanwhile, their specialized designs limit their universality as general spatio-temporal learning solutions. In this paper, we propose to model the learning tasks in a unified perspective, viewing them as predictions based on conditional information with shared spatio-temporal patterns. Based on this proposal, we introduce Unified Spatio-Temporal Diffusion Models (USTD) to address the tasks uniformly within the uncertainty-aware diffusion framework. USTD is holistically designed, comprising a shared spatio-temporal encoder and attention-based denoising networks that are task-specific. The shared encoder, optimized by a pre-training strategy, effectively captures conditional spatio-temporal patterns. The denoising networks, utilizing both cross- and self-attention, integrate conditional dependencies and generate predictions. Opting for forecasting and kriging as downstream tasks, we design Gated Attention (SGA) and Temporal Gated Attention (TGA) for each task, with different emphases on the spatial and temporal dimensions, respectively. By combining the advantages of deterministic encoders and probabilistic diffusion models, USTD achieves state-of-the-art performances compared to deterministic and probabilistic baselines in both tasks, while also providing valuable uncertainty estimates.
    Human-Guided Complexity-Controlled Abstractions. (arXiv:2310.17550v1 [cs.LG])
    Neural networks often learn task-specific latent representations that fail to generalize to novel settings or tasks. Conversely, humans learn discrete representations (i.e., concepts or words) at a variety of abstraction levels (e.g., ``bird'' vs. ``sparrow'') and deploy the appropriate abstraction based on task. Inspired by this, we train neural models to generate a spectrum of discrete representations, and control the complexity of the representations (roughly, how many bits are allocated for encoding inputs) by tuning the entropy of the distribution over representations. In finetuning experiments, using only a small number of labeled examples for a new task, we show that (1) tuning the representation to a task-appropriate complexity level supports the highest finetuning performance, and (2) in a human-participant study, users were able to identify the appropriate complexity level for a downstream task using visualizations of discrete representations. Our results indicate a promising direction for rapid model finetuning by leveraging human insight.
    Variance of ML-based software fault predictors: are we really improving fault prediction?. (arXiv:2310.17264v1 [cs.SE])
    Software quality assurance activities become increasingly difficult as software systems become more and more complex and continuously grow in size. Moreover, testing becomes even more expensive when dealing with large-scale systems. Thus, to effectively allocate quality assurance resources, researchers have proposed fault prediction (FP) which utilizes machine learning (ML) to predict fault-prone code areas. However, ML algorithms typically make use of stochastic elements to increase the prediction models' generalizability and efficiency of the training process. These stochastic elements, also known as nondeterminism-introducing (NI) factors, lead to variance in the training process and as a result, lead to variance in prediction accuracy and training time. This variance poses a challenge for reproducibility in research. More importantly, while fault prediction models may have shown good performance in the lab (e.g., often-times involving multiple runs and averaging outcomes), high variance of results can pose the risk that these models show low performance when applied in practice. In this work, we experimentally analyze the variance of a state-of-the-art fault prediction approach. Our experimental results indicate that NI factors can indeed cause considerable variance in the fault prediction models' accuracy. We observed a maximum variance of 10.10% in terms of the per-class accuracy metric. We thus, also discuss how to deal with such variance.
    Optimization dependent generalization bound for ReLU networks based on sensitivity in the tangent bundle. (arXiv:2310.17378v1 [cs.LG])
    Recent advances in deep learning have given us some very promising results on the generalization ability of deep neural networks, however literature still lacks a comprehensive theory explaining why heavily over-parametrized models are able to generalize well while fitting the training data. In this paper we propose a PAC type bound on the generalization error of feedforward ReLU networks via estimating the Rademacher complexity of the set of networks available from an initial parameter vector via gradient descent. The key idea is to bound the sensitivity of the network's gradient to perturbation of the input data along the optimization trajectory. The obtained bound does not explicitly depend on the depth of the network. Our results are experimentally verified on the MNIST and CIFAR-10 datasets.
    Sign Languague Recognition without frame-sequencing constraints: A proof of concept on the Argentinian Sign Language. (arXiv:2310.17437v1 [cs.CV])
    Automatic sign language recognition (SLR) is an important topic within the areas of human-computer interaction and machine learning. On the one hand, it poses a complex challenge that requires the intervention of various knowledge areas, such as video processing, image processing, intelligent systems and linguistics. On the other hand, robust recognition of sign language could assist in the translation process and the integration of hearing-impaired people, as well as the teaching of sign language for the hearing population. SLR systems usually employ Hidden Markov Models, Dynamic Time Warping or similar models to recognize signs. Such techniques exploit the sequential ordering of frames to reduce the number of hypothesis. This paper presents a general probabilistic model for sign classification that combines sub-classifiers based on different types of features such as position, movement and handshape. The model employs a bag-of-words approach in all classification steps, to explore the hypothesis that ordering is not essential for recognition. The proposed model achieved an accuracy rate of 97% on an Argentinian Sign Language dataset containing 64 classes of signs and 3200 samples, providing some evidence that indeed recognition without ordering is possible.
    Orchestration of Emulator Assisted Mobile Edge Tuning for AI Foundation Models: A Multi-Agent Deep Reinforcement Learning Approach. (arXiv:2310.17492v1 [cs.AI])
    The efficient deployment and fine-tuning of foundation models are pivotal in contemporary artificial intelligence. In this study, we present a groundbreaking paradigm integrating Mobile Edge Computing (MEC) with foundation models, specifically designed to enhance local task performance on user equipment (UE). Central to our approach is the innovative Emulator-Adapter architecture, segmenting the foundation model into two cohesive modules. This design not only conserves computational resources but also ensures adaptability and fine-tuning efficiency for downstream tasks. Additionally, we introduce an advanced resource allocation mechanism that is fine-tuned to the needs of the Emulator-Adapter structure in decentralized settings. To address the challenges presented by this system, we employ a hybrid multi-agent Deep Reinforcement Learning (DRL) strategy, adept at handling mixed discrete-continuous action spaces, ensuring dynamic and optimal resource allocations. Our comprehensive simulations and validations underscore the practical viability of our approach, demonstrating its robustness, efficiency, and scalability. Collectively, this work offers a fresh perspective on deploying foundation models and balancing computational efficiency with task proficiency.
    Secure short-term load forecasting for smart grids with transformer-based federated learning. (arXiv:2310.17477v1 [cs.LG])
    Electricity load forecasting is an essential task within smart grids to assist demand and supply balance. While advanced deep learning models require large amounts of high-resolution data for accurate short-term load predictions, fine-grained load profiles can expose users' electricity consumption behaviors, which raises privacy and security concerns. One solution to improve data privacy is federated learning, where models are trained locally on private data, and only the trained model parameters are merged and updated on a global server. Therefore, this paper presents a novel transformer-based deep learning approach with federated learning for short-term electricity load prediction. To evaluate our results, we benchmark our federated learning architecture against central and local learning and compare the performance of our model to long short-term memory models and convolutional neural networks. Our simulations are based on a dataset from a German university campus and show that transformer-based forecasting is a promising alternative to state-of-the-art models within federated learning.
    De-novo Chemical Reaction Generation by Means of Temporarily Convolutional Neural Networks. (arXiv:2310.17341v1 [cs.LG])
    We present here a combination of two networks, Recurrent Neural Networks (RNN) and Temporarily Convolutional Neural Networks (TCN) in de novo reaction generation using the novel Reaction Smiles-like representation of reactions (CGRSmiles) with atom mapping directly incorporated. Recurrent Neural Networks are known for their autoregressive properties and are frequently used in language modelling with direct application to SMILES generation. The relatively novel TCNs possess similar properties with wide receptive field while obeying the causality required for natural language processing (NLP). The combination of both latent representations expressed through TCN and RNN results in an overall better performance compared to RNN alone. Additionally, it is shown that different fine-tuning protocols have a profound impact on generative scope of the model when applied on a dataset of interest via transfer learning.
    Enhancing Graph Neural Networks with Structure-Based Prompt. (arXiv:2310.17394v1 [cs.LG])
    Graph Neural Networks (GNNs) are powerful in learning semantics of graph data. Recently, a new paradigm "pre-train, prompt" has shown promising results in adapting GNNs to various tasks with less supervised data. The success of such paradigm can be attributed to the more consistent objectives of pre-training and task-oriented prompt tuning, where the pre-trained knowledge can be effectively transferred to downstream tasks. However, an overlooked issue of existing studies is that the structure information of graph is usually exploited during pre-training for learning node representations, while neglected in the prompt tuning stage for learning task-specific parameters. To bridge this gap, we propose a novel structure-based prompting method for GNNs, namely SAP, which consistently exploits structure information in both pre-training and prompt tuning stages. In particular, SAP 1) employs a dual-view contrastive learning to align the latent semantic spaces of node attributes and graph structure, and 2) incorporates structure information in prompted graph to elicit more pre-trained knowledge in prompt tuning. We conduct extensive experiments on node classification and graph classification tasks to show the effectiveness of SAP. Moreover, we show that SAP can lead to better performance in more challenging few-shot scenarios on both homophilous and heterophilous graphs.
    Bayesian Neural Controlled Differential Equations for Treatment Effect Estimation. (arXiv:2310.17463v1 [cs.LG])
    Treatment effect estimation in continuous time is crucial for personalized medicine. However, existing methods for this task are limited to point estimates of the potential outcomes, whereas uncertainty estimates have been ignored. Needless to say, uncertainty quantification is crucial for reliable decision-making in medical applications. To fill this gap, we propose a novel Bayesian neural controlled differential equation (BNCDE) for treatment effect estimation in continuous time. In our BNCDE, the time dimension is modeled through a coupled system of neural controlled differential equations and neural stochastic differential equations, where the neural stochastic differential equations allow for tractable variational Bayesian inference. Thereby, for an assigned sequence of treatments, our BNCDE provides meaningful posterior predictive distributions of the potential outcomes. To the best of our knowledge, ours is the first tailored neural method to provide uncertainty estimates of treatment effects in continuous time. As such, our method is of direct practical value for promoting reliable decision-making in medicine.
    Detection Defenses: An Empty Promise against Adversarial Patch Attacks on Optical Flow. (arXiv:2310.17403v1 [cs.CV])
    Adversarial patches undermine the reliability of optical flow predictions when placed in arbitrary scene locations. Therefore, they pose a realistic threat to real-world motion detection and its downstream applications. Potential remedies are defense strategies that detect and remove adversarial patches, but their influence on the underlying motion prediction has not been investigated. In this paper, we thoroughly examine the currently available detect-and-remove defenses ILP and LGS for a wide selection of state-of-the-art optical flow methods, and illuminate their side effects on the quality and robustness of the final flow predictions. In particular, we implement defense-aware attacks to investigate whether current defenses are able to withstand attacks that take the defense mechanism into account. Our experiments yield two surprising results: Detect-and-remove defenses do not only lower the optical flow quality on benign scenes, in doing so, they also harm the robustness under patch attacks for all tested optical flow methods except FlowNetC. As currently employed detect-and-remove defenses fail to deliver the promised adversarial robustness for optical flow, they evoke a false sense of security. The code is available at https://github.com/cv-stuttgart/DetectionDefenses.
    Likelihood-based Out-of-Distribution Detection with Denoising Diffusion Probabilistic Models. (arXiv:2310.17432v1 [cs.LG])
    Out-of-Distribution detection between dataset pairs has been extensively explored with generative models. We show that likelihood-based Out-of-Distribution detection can be extended to diffusion models by leveraging the fact that they, like other likelihood-based generative models, are dramatically affected by the input sample complexity. Currently, all Out-of-Distribution detection methods with Diffusion Models are reconstruction-based. We propose a new likelihood ratio for Out-of-Distribution detection with Deep Denoising Diffusion Models, which we call the Complexity Corrected Likelihood Ratio. Our likelihood ratio is constructed using Evidence Lower-Bound evaluations from an individual model at various noising levels. We present results that are comparable to state-of-the-art Out-of-Distribution detection methods with generative models.
    Cross-modal Active Complementary Learning with Self-refining Correspondence. (arXiv:2310.17468v1 [cs.CV])
    Recently, image-text matching has attracted more and more attention from academia and industry, which is fundamental to understanding the latent correspondence across visual and textual modalities. However, most existing methods implicitly assume the training pairs are well-aligned while ignoring the ubiquitous annotation noise, a.k.a noisy correspondence (NC), thereby inevitably leading to a performance drop. Although some methods attempt to address such noise, they still face two challenging problems: excessive memorizing/overfitting and unreliable correction for NC, especially under high noise. To address the two problems, we propose a generalized Cross-modal Robust Complementary Learning framework (CRCL), which benefits from a novel Active Complementary Loss (ACL) and an efficient Self-refining Correspondence Correction (SCC) to improve the robustness of existing methods. Specifically, ACL exploits active and complementary learning losses to reduce the risk of providing erroneous supervision, leading to theoretically and experimentally demonstrated robustness against NC. SCC utilizes multiple self-refining processes with momentum correction to enlarge the receptive field for correcting correspondences, thereby alleviating error accumulation and achieving accurate and stable corrections. We carry out extensive experiments on three image-text benchmarks, i.e., Flickr30K, MS-COCO, and CC152K, to verify the superior robustness of our CRCL against synthetic and real-world noisy correspondences.
    Causal Modeling with Stationary Diffusions. (arXiv:2310.17405v1 [cs.LG])
    We develop a novel approach towards causal inference. Rather than structural equations over a causal graph, we learn stochastic differential equations (SDEs) whose stationary densities model a system's behavior under interventions. These stationary diffusion models do not require the formalism of causal graphs, let alone the common assumption of acyclicity. We show that in several cases, they generalize to unseen interventions on their variables, often better than classical approaches. Our inference method is based on a new theoretical result that expresses a stationarity condition on the diffusion's generator in a reproducing kernel Hilbert space. The resulting kernel deviation from stationarity (KDS) is an objective function of independent interest.
    Towards Learning Monocular 3D Object Localization From 2D Labels using the Physical Laws of Motion. (arXiv:2310.17462v1 [cs.CV])
    We present a novel method for precise 3D object localization in single images from a single calibrated camera using only 2D labels. No expensive 3D labels are needed. Thus, instead of using 3D labels, our model is trained with easy-to-annotate 2D labels along with the physical knowledge of the object's motion. Given this information, the model can infer the latent third dimension, even though it has never seen this information during training. Our method is evaluated on both synthetic and real-world datasets, and we are able to achieve a mean distance error of just 6 cm in our experiments on real data. The results indicate the method's potential as a step towards learning 3D object location estimation, where collecting 3D data for training is not feasible.
    On the recognition of the game type based on physiological signals and eye tracking. (arXiv:2310.17383v1 [cs.LG])
    Automated interpretation of signals yields many impressive applications from the area of affective computing and human activity recognition (HAR). In this paper we ask the question about possibility of cognitive activity recognition on the base of particular set of signals. We use recognition of the game played by the participant as a playground for exploration of the problem. We build classifier of three different games (Space Invaders, Tetris, Tower Defence) and inter-game pause. We validate classifier in the player-independent and player-dependent scenario. We discuss the improvement in the player-dependent scenario in the context of biometric person recognition. On the base of the results obtained in game classification, we consider potential applications in smart surveillance and quantified self.
    Tackling Interference Induced by Data Training Loops in A/B Tests: A Weighted Training Approach. (arXiv:2310.17496v1 [stat.ME])
    In modern recommendation systems, the standard pipeline involves training machine learning models on historical data to predict user behaviors and improve recommendations continuously. However, these data training loops can introduce interference in A/B tests, where data generated by control and treatment algorithms, potentially with different distributions, are combined. To address these challenges, we introduce a novel approach called weighted training. This approach entails training a model to predict the probability of each data point appearing in either the treatment or control data and subsequently applying weighted losses during model training. We demonstrate that this approach achieves the least variance among all estimators without causing shifts in the training distributions. Through simulation studies, we demonstrate the lower bias and variance of our approach compared to other methods.
    The IMS Toucan System for the Blizzard Challenge 2023. (arXiv:2310.17499v1 [cs.CL])
    For our contribution to the Blizzard Challenge 2023, we improved on the system we submitted to the Blizzard Challenge 2021. Our approach entails a rule-based text-to-phoneme processing system that includes rule-based disambiguation of homographs in the French language. It then transforms the phonemes to spectrograms as intermediate representations using a fast and efficient non-autoregressive synthesis architecture based on Conformer and Glow. A GAN based neural vocoder that combines recent state-of-the-art approaches converts the spectrogram to the final wave. We carefully designed the data processing, training, and inference procedures for the challenge data. Our system identifier is G. Open source code and demo are available.
    Handshape recognition for Argentinian Sign Language using ProbSom. (arXiv:2310.17427v1 [cs.CV])
    Automatic sign language recognition is an important topic within the areas of human-computer interaction and machine learning. On the one hand, it poses a complex challenge that requires the intervention of various knowledge areas, such as video processing, image processing, intelligent systems and linguistics. On the other hand, robust recognition of sign language could assist in the translation process and the integration of hearing-impaired people. This paper offers two main contributions: first, the creation of a database of handshapes for the Argentinian Sign Language (LSA), which is a topic that has barely been discussed so far. Secondly, a technique for image processing, descriptor extraction and subsequent handshape classification using a supervised adaptation of self-organizing maps that is called ProbSom. This technique is compared to others in the state of the art, such as Support Vector Machines (SVM), Random Forests, and Neural Networks. The database that was built contains 800 images with 16 LSA handshapes, and is a first step towards building a comprehensive database of Argentinian signs. The ProbSom-based neural classifier, using the proposed descriptor, achieved an accuracy rate above 90%.
    Learning Regularized Graphon Mean-Field Games with Unknown Graphons. (arXiv:2310.17531v1 [cs.GT])
    We design and analyze reinforcement learning algorithms for Graphon Mean-Field Games (GMFGs). In contrast to previous works that require the precise values of the graphons, we aim to learn the Nash Equilibrium (NE) of the regularized GMFGs when the graphons are unknown. Our contributions are threefold. First, we propose the Proximal Policy Optimization for GMFG (GMFG-PPO) algorithm and show that it converges at a rate of $O(T^{-1/3})$ after $T$ iterations with an estimation oracle, improving on a previous work by Xie et al. (ICML, 2021). Second, using kernel embedding of distributions, we design efficient algorithms to estimate the transition kernels, reward functions, and graphons from sampled agents. Convergence rates are then derived when the positions of the agents are either known or unknown. Results for the combination of the optimization algorithm GMFG-PPO and the estimation algorithm are then provided. These algorithms are the first specifically designed for learning graphons from sampled agents. Finally, the efficacy of the proposed algorithms are corroborated through simulations. These simulations demonstrate that learning the unknown graphons reduces the exploitability effectively.
    Interactive Robot Learning from Verbal Correction. (arXiv:2310.17555v1 [cs.RO])
    The ability to learn and refine behavior after deployment has become ever more important for robots as we design them to operate in unstructured environments like households. In this work, we design a new learning system based on large language model (LLM), OLAF, that allows everyday users to teach a robot using verbal corrections when the robot makes mistakes, e.g., by saying "Stop what you're doing. You should move closer to the cup." A key feature of OLAF is its ability to update the robot's visuomotor neural policy based on the verbal feedback to avoid repeating mistakes in the future. This is in contrast to existing LLM-based robotic systems, which only follow verbal commands or corrections but not learn from them. We demonstrate the efficacy of our design in experiments where a user teaches a robot to perform long-horizon manipulation tasks both in simulation and on physical hardware, achieving on average 20.0% improvement in policy success rate. Videos and more results are at https://ut-austin-rpl.github.io/olaf/
    Invariance Measures for Neural Networks. (arXiv:2310.17404v1 [cs.LG])
    Invariances in neural networks are useful and necessary for many tasks. However, the representation of the invariance of most neural network models has not been characterized. We propose measures to quantify the invariance of neural networks in terms of their internal representation. The measures are efficient and interpretable, and can be applied to any neural network model. They are also more sensitive to invariance than previously defined measures. We validate the measures and their properties in the domain of affine transformations and the CIFAR10 and MNIST datasets, including their stability and interpretability. Using the measures, we perform a first analysis of CNN models and show that their internal invariance is remarkably stable to random weight initializations, but not to changes in dataset or transformation. We believe the measures will enable new avenues of research in invariance representation.
    The Expressive Power of Low-Rank Adaptation. (arXiv:2310.17513v1 [cs.LG])
    Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method that leverages low-rank adaptation of weight matrices, has emerged as a prevalent technique for fine-tuning pre-trained models such as large language models and diffusion models. Despite its huge success in practice, the theoretical underpinnings of LoRA have largely remained unexplored. This paper takes the first step to bridge this gap by theoretically analyzing the expressive power of LoRA. We prove that, for fully connected neural networks, LoRA can adapt any model $f$ to accurately represent any smaller target model $\overline{f}$ if LoRA-rank $\geq(\text{width of }f) \times \frac{\text{depth of }\overline{f}}{\text{depth of }f}$. We also quantify the approximation error when LoRA-rank is lower than the threshold. For Transformer networks, we show any model can be adapted to a target model of the same size with rank-$(\frac{\text{embedding size}}{2})$ LoRA adapters.
    Coalitional Bargaining via Reinforcement Learning: An Application to Collaborative Vehicle Routing. (arXiv:2310.17458v1 [cs.LG])
    Collaborative Vehicle Routing is where delivery companies cooperate by sharing their delivery information and performing delivery requests on behalf of each other. This achieves economies of scale and thus reduces cost, greenhouse gas emissions, and road congestion. But which company should partner with whom, and how much should each company be compensated? Traditional game theoretic solution concepts, such as the Shapley value or nucleolus, are difficult to calculate for the real-world problem of Collaborative Vehicle Routing due to the characteristic function scaling exponentially with the number of agents. This would require solving the Vehicle Routing Problem (an NP-Hard problem) an exponential number of times. We therefore propose to model this problem as a coalitional bargaining game where - crucially - agents are not given access to the characteristic function. Instead, we implicitly reason about the characteristic function, and thus eliminate the need to evaluate the VRP an exponential number of times - we only need to evaluate it once. Our contribution is that our decentralised approach is both scalable and considers the self-interested nature of companies. The agents learn using a modified Independent Proximal Policy Optimisation. Our RL agents outperform a strong heuristic bot. The agents correctly identify the optimal coalitions 79% of the time with an average optimality gap of 4.2% and reduction in run-time of 62%.
    CBD: A Certified Backdoor Detector Based on Local Dominant Probability. (arXiv:2310.17498v1 [cs.LG])
    Backdoor attack is a common threat to deep neural networks. During testing, samples embedded with a backdoor trigger will be misclassified as an adversarial target by a backdoored model, while samples without the backdoor trigger will be correctly classified. In this paper, we present the first certified backdoor detector (CBD), which is based on a novel, adjustable conformal prediction scheme based on our proposed statistic local dominant probability. For any classifier under inspection, CBD provides 1) a detection inference, 2) the condition under which the attacks are guaranteed to be detectable for the same classification domain, and 3) a probabilistic upper bound for the false positive rate. Our theoretical results show that attacks with triggers that are more resilient to test-time noise and have smaller perturbation magnitudes are more likely to be detected with guarantees. Moreover, we conduct extensive experiments on four benchmark datasets considering various backdoor types, such as BadNet, CB, and Blend. CBD achieves comparable or even higher detection accuracy than state-of-the-art detectors, and it in addition provides detection certification. Notably, for backdoor attacks with random perturbation triggers bounded by $\ell_2\leq0.75$ which achieves more than 90\% attack success rate, CBD achieves 100\% (98\%), 100\% (84\%), 98\% (98\%), and 72\% (40\%) empirical (certified) detection true positive rates on the four benchmark datasets GTSRB, SVHN, CIFAR-10, and TinyImageNet, respectively, with low false positive rates.
    Hierarchical Ensemble-Based Feature Selection for Time Series Forecasting. (arXiv:2310.17544v1 [cs.LG])
    We study a novel ensemble approach for feature selection based on hierarchical stacking in cases of non-stationarity and limited number of samples with large number of features. Our approach exploits the co-dependency between features using a hierarchical structure. Initially, a machine learning model is trained using a subset of features, and then the model's output is updated using another algorithm with the remaining features to minimize the target loss. This hierarchical structure allows for flexible depth and feature selection. By exploiting feature co-dependency hierarchically, our proposed approach overcomes the limitations of traditional feature selection methods and feature importance scores. The effectiveness of the approach is demonstrated on synthetic and real-life datasets, indicating improved performance with scalability and stability compared to the traditional methods and state-of-the-art approaches.
    A Challenge in Reweighting Data with Bilevel Optimization. (arXiv:2310.17386v1 [stat.ML])
    In many scenarios, one uses a large training set to train a model with the goal of performing well on a smaller testing set with a different distribution. Learning a weight for each data point of the training set is an appealing solution, as it ideally allows one to automatically learn the importance of each training point for generalization on the testing set. This task is usually formalized as a bilevel optimization problem. Classical bilevel solvers are based on a warm-start strategy where both the parameters of the models and the data weights are learned at the same time. We show that this joint dynamic may lead to sub-optimal solutions, for which the final data weights are very sparse. This finding illustrates the difficulty of data reweighting and offers a clue as to why this method is rarely used in practice.
    Foundation Model Based Native AI Framework in 6G with Cloud-Edge-End Collaboration. (arXiv:2310.17471v1 [cs.IT])
    Future wireless communication networks are in a position to move beyond data-centric, device-oriented connectivity and offer intelligent, immersive experiences based on task-oriented connections, especially in the context of the thriving development of pre-trained foundation models (PFM) and the evolving vision of 6G native artificial intelligence (AI). Therefore, redefining modes of collaboration between devices and servers and constructing native intelligence libraries become critically important in 6G. In this paper, we analyze the challenges of achieving 6G native AI from the perspectives of data, intelligence, and networks. Then, we propose a 6G native AI framework based on foundation models, provide a customization approach for intent-aware PFM, present a construction of a task-oriented AI toolkit, and outline a novel cloud-edge-end collaboration paradigm. As a practical use case, we apply this framework for orchestration, achieving the maximum sum rate within a wireless communication system, and presenting preliminary evaluation results. Finally, we outline research directions for achieving native AI in 6G.
    Fair collaborative vehicle routing: A deep multi-agent reinforcement learning approach. (arXiv:2310.17485v1 [cs.LG])
    Collaborative vehicle routing occurs when carriers collaborate through sharing their transportation requests and performing transportation requests on behalf of each other. This achieves economies of scale, thus reducing cost, greenhouse gas emissions and road congestion. But which carrier should partner with whom, and how much should each carrier be compensated? Traditional game theoretic solution concepts are expensive to calculate as the characteristic function scales exponentially with the number of agents. This would require solving the vehicle routing problem (NP-hard) an exponential number of times. We therefore propose to model this problem as a coalitional bargaining game solved using deep multi-agent reinforcement learning, where - crucially - agents are not given access to the characteristic function. Instead, we implicitly reason about the characteristic function; thus, when deployed in production, we only need to evaluate the expensive post-collaboration vehicle routing problem once. Our contribution is that we are the first to consider both the route allocation problem and gain sharing problem simultaneously - without access to the expensive characteristic function. Through decentralised machine learning, our agents bargain with each other and agree to outcomes that correlate well with the Shapley value - a fair profit allocation mechanism. Importantly, we are able to achieve a reduction in run-time of 88%.
    Benign Oscillation of Stochastic Gradient Descent with Large Learning Rates. (arXiv:2310.17074v1 [cs.LG])
    In this work, we theoretically investigate the generalization properties of neural networks (NN) trained by stochastic gradient descent (SGD) algorithm with large learning rates. Under such a training regime, our finding is that, the oscillation of the NN weights caused by the large learning rate SGD training turns out to be beneficial to the generalization of the NN, which potentially improves over the same NN trained by SGD with small learning rates that converges more smoothly. In view of this finding, we call such a phenomenon "benign oscillation". Our theory towards demystifying such a phenomenon builds upon the feature learning perspective of deep learning. Specifically, we consider a feature-noise data generation model that consists of (i) weak features which have a small $\ell_2$-norm and appear in each data point; (ii) strong features which have a larger $\ell_2$-norm but only appear in a certain fraction of all data points; and (iii) noise. We prove that NNs trained by oscillating SGD with a large learning rate can effectively learn the weak features in the presence of those strong features. In contrast, NNs trained by SGD with a small learning rate can only learn the strong features but makes little progress in learning the weak features. Consequently, when it comes to the new testing data which consist of only weak features, the NN trained by oscillating SGD with a large learning rate could still make correct predictions consistently, while the NN trained by small learning rate SGD fails. Our theory sheds light on how large learning rate training benefits the generalization of NNs. Experimental results demonstrate our finding on "benign oscillation".
    Demonstration-Regularized RL. (arXiv:2310.17303v1 [stat.ML])
    Incorporating expert demonstrations has empirically helped to improve the sample efficiency of reinforcement learning (RL). This paper quantifies theoretically to what extent this extra information reduces RL's sample complexity. In particular, we study the demonstration-regularized reinforcement learning that leverages the expert demonstrations by KL-regularization for a policy learned by behavior cloning. Our findings reveal that using $N^{\mathrm{E}}$ expert demonstrations enables the identification of an optimal policy at a sample complexity of order $\widetilde{\mathcal{O}}(\mathrm{Poly}(S,A,H)/(\varepsilon^2 N^{\mathrm{E}}))$ in finite and $\widetilde{\mathcal{O}}(\mathrm{Poly}(d,H)/(\varepsilon^2 N^{\mathrm{E}}))$ in linear Markov decision processes, where $\varepsilon$ is the target precision, $H$ the horizon, $A$ the number of action, $S$ the number of states in the finite case and $d$ the dimension of the feature space in the linear case. As a by-product, we provide tight convergence guarantees for the behaviour cloning procedure under general assumptions on the policy classes. Additionally, we establish that demonstration-regularized methods are provably efficient for reinforcement learning from human feedback (RLHF). In this respect, we provide theoretical evidence showing the benefits of KL-regularization for RLHF in tabular and linear MDPs. Interestingly, we avoid pessimism injection by employing computationally feasible regularization to handle reward estimation uncertainty, thus setting our approach apart from the prior works.
    Weakly-Supervised Surgical Phase Recognition. (arXiv:2310.17209v1 [cs.CV])
    A key element of computer-assisted surgery systems is phase recognition of surgical videos. Existing phase recognition algorithms require frame-wise annotation of a large number of videos, which is time and money consuming. In this work we join concepts of graph segmentation with self-supervised learning to derive a random-walk solution for per-frame phase prediction. Furthermore, we utilize within our method two forms of weak supervision: sparse timestamps or few-shot learning. The proposed algorithm enjoys low complexity and can operate in lowdata regimes. We validate our method by running experiments with the public Cholec80 dataset of laparoscopic cholecystectomy videos, demonstrating promising performance in multiple setups.
    On Forecast Stability. (arXiv:2310.17332v1 [cs.LG])
    Forecasts are typically not produced in a vacuum but in a business context, where forecasts are generated on a regular basis and interact with each other. For decisions, it may be important that forecasts do not change arbitrarily, and are stable in some sense. However, this area has received only limited attention in the forecasting literature. In this paper, we explore two types of forecast stability that we call vertical stability and horizontal stability. The existing works in the literature are only applicable to certain base models and extending these frameworks to be compatible with any base model is not straightforward. Furthermore, these frameworks can only stabilise the forecasts vertically. To fill this gap, we propose a simple linear-interpolation-based approach that is applicable to stabilise the forecasts provided by any base model vertically and horizontally. The approach can produce both accurate and stable forecasts. Using N-BEATS, Pooled Regression and LightGBM as the base models, in our evaluation on four publicly available datasets, the proposed framework is able to achieve significantly higher stability and/or accuracy compared to a set of benchmarks including a state-of-the-art forecast stabilisation method across three error metrics and six stability metrics.
    fairret: a Framework for Differentiable Fairness Regularization Terms. (arXiv:2310.17256v1 [cs.LG])
    Current tools for machine learning fairness only admit a limited range of fairness definitions and have seen little integration with automatic differentiation libraries, despite the central role these libraries play in modern machine learning pipelines. We introduce a framework of fairness regularization terms (fairrets) which quantify bias as modular objectives that are easily integrated in automatic differentiation pipelines. By employing a general definition of fairness in terms of linear-fractional statistics, a wide class of fairrets can be computed efficiently. Experiments show the behavior of their gradients and their utility in enforcing fairness with minimal loss of predictive power compared to baselines. Our contribution includes a PyTorch implementation of the fairret framework.
    Improving Denoising Diffusion Models via Simultaneous Estimation of Image and Noise. (arXiv:2310.17167v1 [cs.LG])
    This paper introduces two key contributions aimed at improving the speed and quality of images generated through inverse diffusion processes. The first contribution involves reparameterizing the diffusion process in terms of the angle on a quarter-circular arc between the image and noise, specifically setting the conventional $\displaystyle \sqrt{\bar{\alpha}}=\cos(\eta)$. This reparameterization eliminates two singularities and allows for the expression of diffusion evolution as a well-behaved ordinary differential equation (ODE). In turn, this allows higher order ODE solvers such as Runge-Kutta methods to be used effectively. The second contribution is to directly estimate both the image ($\mathbf{x}_0$) and noise ($\mathbf{\epsilon}$) using our network, which enables more stable calculations of the update step in the inverse diffusion steps, as accurate estimation of both the image and noise are crucial at different stages of the process. Together with these changes, our model achieves faster generation, with the ability to converge on high-quality images more quickly, and higher quality of the generated images, as measured by metrics such as Frechet Inception Distance (FID), spatial Frechet Inception Distance (sFID), precision, and recall.
    C-Disentanglement: Discovering Causally-Independent Generative Factors under an Inductive Bias of Confounder. (arXiv:2310.17325v1 [cs.LG])
    Representation learning assumes that real-world data is generated by a few semantically meaningful generative factors (i.e., sources of variation) and aims to discover them in the latent space. These factors are expected to be causally disentangled, meaning that distinct factors are encoded into separate latent variables, and changes in one factor will not affect the values of the others. Compared to statistical independence, causal disentanglement allows more controllable data generation, improved robustness, and better generalization. However, most existing work assumes unconfoundedness in the discovery process, that there are no common causes to the generative factors and thus obtain only statistical independence. In this paper, we recognize the importance of modeling confounders in discovering causal generative factors. Unfortunately, such factors are not identifiable without proper inductive bias. We fill the gap by introducing a framework entitled Confounded-Disentanglement (C-Disentanglement), the first framework that explicitly introduces the inductive bias of confounder via labels from domain expertise. In addition, we accordingly propose an approach to sufficiently identify the causally disentangled factors under any inductive bias of the confounder. We conduct extensive experiments on both synthetic and real-world datasets. Our method demonstrates competitive results compared to various SOTA baselines in obtaining causally disentangled features and downstream tasks under domain shifts.
    CQM: Curriculum Reinforcement Learning with a Quantized World Model. (arXiv:2310.17330v1 [cs.LG])
    Recent curriculum Reinforcement Learning (RL) has shown notable progress in solving complex tasks by proposing sequences of surrogate tasks. However, the previous approaches often face challenges when they generate curriculum goals in a high-dimensional space. Thus, they usually rely on manually specified goal spaces. To alleviate this limitation and improve the scalability of the curriculum, we propose a novel curriculum method that automatically defines the semantic goal space which contains vital information for the curriculum process, and suggests curriculum goals over it. To define the semantic goal space, our method discretizes continuous observations via vector quantized-variational autoencoders (VQ-VAE) and restores the temporal relations between the discretized observations by a graph. Concurrently, ours suggests uncertainty and temporal distance-aware curriculum goals that converges to the final goals over the automatically composed goal space. We demonstrate that the proposed method allows efficient explorations in an uninformed environment with raw goal examples only. Also, ours outperforms the state-of-the-art curriculum RL methods on data efficiency and performance, in various goal-reaching tasks even with ego-centric visual inputs.
    How do Language Models Bind Entities in Context?. (arXiv:2310.17191v1 [cs.LG])
    To correctly use in-context information, language models (LMs) must bind entities to their attributes. For example, given a context describing a "green square" and a "blue circle", LMs must bind the shapes to their respective colors. We analyze LM representations and identify the binding ID mechanism: a general mechanism for solving the binding problem, which we observe in every sufficiently large model from the Pythia and LLaMA families. Using causal interventions, we show that LMs' internal activations represent binding information by attaching binding ID vectors to corresponding entities and attributes. We further show that binding ID vectors form a continuous subspace, in which distances between binding ID vectors reflect their discernability. Overall, our results uncover interpretable strategies in LMs for representing symbolic knowledge in-context, providing a step towards understanding general in-context reasoning in large-scale LMs.
    A multi-artifact EEG denoising by frequency-based deep learning. (arXiv:2310.17335v1 [cs.LG])
    Electroencephalographic (EEG) signals are fundamental to neuroscience research and clinical applications such as brain-computer interfaces and neurological disorder diagnosis. These signals are typically a combination of neurological activity and noise, originating from various sources, including physiological artifacts like ocular and muscular movements. Under this setting, we tackle the challenge of distinguishing neurological activity from noise-related sources. We develop a novel EEG denoising model that operates in the frequency domain, leveraging prior knowledge about noise spectral features to adaptively compute optimal convolutional filters for noise separation. The model is trained to learn an empirical relationship connecting the spectral characteristics of noise and noisy signal to a non-linear transformation which allows signal denoising. Performance evaluation on the EEGdenoiseNet dataset shows that the proposed model achieves optimal results according to both temporal and spectral metrics. The model is found to remove physiological artifacts from input EEG data, thus achieving effective EEG denoising. Indeed, the model performance either matches or outperforms that achieved by benchmark models, proving to effectively remove both muscle and ocular artifacts without the need to perform any training on the particular type of artifact.
    DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model Statistics. (arXiv:2310.13268v2 [cs.CV] UPDATED)
    Diffusion probabilistic models (DPMs) have exhibited excellent performance for high-fidelity image generation while suffering from inefficient sampling. Recent works accelerate the sampling procedure by proposing fast ODE solvers that leverage the specific ODE form of DPMs. However, they highly rely on specific parameterization during inference (such as noise/data prediction), which might not be the optimal choice. In this work, we propose a novel formulation towards the optimal parameterization during sampling that minimizes the first-order discretization error of the ODE solution. Based on such formulation, we propose \textit{DPM-Solver-v3}, a new fast ODE solver for DPMs by introducing several coefficients efficiently computed on the pretrained model, which we call \textit{empirical model statistics}. We further incorporate multistep methods and a predictor-corrector framework, and propose some techniques for improving sample quality at small numbers of function evaluations (NFE) or large guidance scales. Experiments show that DPM-Solver-v3 achieves consistently better or comparable performance in both unconditional and conditional sampling with both pixel-space and latent-space DPMs, especially in 5$\sim$10 NFEs. We achieve FIDs of 12.21 (5 NFE), 2.51 (10 NFE) on unconditional CIFAR10, and MSE of 0.55 (5 NFE, 7.5 guidance scale) on Stable Diffusion, bringing a speed-up of 15\%$\sim$30\% compared to previous state-of-the-art training-free methods. Code is available at \url{https://github.com/thu-ml/DPM-Solver-v3}.
    Graphical Object-Centric Actor-Critic. (arXiv:2310.17178v1 [cs.AI])
    There have recently been significant advances in the problem of unsupervised object-centric representation learning and its application to downstream tasks. The latest works support the argument that employing disentangled object representations in image-based object-centric reinforcement learning tasks facilitates policy learning. We propose a novel object-centric reinforcement learning algorithm combining actor-critic and model-based approaches to utilize these representations effectively. In our approach, we use a transformer encoder to extract object representations and graph neural networks to approximate the dynamics of an environment. The proposed method fills a research gap in developing efficient object-centric world models for reinforcement learning settings that can be used for environments with discrete or continuous action spaces. Our algorithm performs better in a visually complex 3D robotic environment and a 2D environment with compositional structure than the state-of-the-art model-free actor-critic algorithm built upon transformer architecture and the state-of-the-art monolithic model-based algorithm.
    CROP: Conservative Reward for Model-based Offline Policy Optimization. (arXiv:2310.17245v1 [cs.LG])
    Offline reinforcement learning (RL) aims to optimize policy using collected data without online interactions. Model-based approaches are particularly appealing for addressing offline RL challenges due to their capability to mitigate the limitations of offline data through data generation using models. Prior research has demonstrated that introducing conservatism into the model or Q-function during policy optimization can effectively alleviate the prevalent distribution drift problem in offline RL. However, the investigation into the impacts of conservatism in reward estimation is still lacking. This paper proposes a novel model-based offline RL algorithm, Conservative Reward for model-based Offline Policy optimization (CROP), which conservatively estimates the reward in model training. To achieve a conservative reward estimation, CROP simultaneously minimizes the estimation error and the reward of random actions. Theoretical analysis shows that this conservative reward mechanism leads to a conservative policy evaluation and helps mitigate distribution drift. Experiments on D4RL benchmarks showcase that the performance of CROP is comparable to the state-of-the-art baselines. Notably, CROP establishes an innovative connection between offline and online RL, highlighting that offline RL problems can be tackled by adopting online RL techniques to the empirical Markov decision process trained with a conservative reward. The source code is available with https://github.com/G0K0URURI/CROP.git.
    Multi-scale Diffusion Denoised Smoothing. (arXiv:2310.16779v2 [cs.LG] UPDATED)
    Along with recent diffusion models, randomized smoothing has become one of a few tangible approaches that offers adversarial robustness to models at scale, e.g., those of large pre-trained models. Specifically, one can perform randomized smoothing on any classifier via a simple "denoise-and-classify" pipeline, so-called denoised smoothing, given that an accurate denoiser is available - such as diffusion model. In this paper, we present scalable methods to address the current trade-off between certified robustness and accuracy in denoised smoothing. Our key idea is to "selectively" apply smoothing among multiple noise scales, coined multi-scale smoothing, which can be efficiently implemented with a single diffusion model. This approach also suggests a new objective to compare the collective robustness of multi-scale smoothed classifiers, and questions which representation of diffusion model would maximize the objective. To address this, we propose to further fine-tune diffusion model (a) to perform consistent denoising whenever the original image is recoverable, but (b) to generate rather diverse outputs otherwise. Our experiments show that the proposed multi-scale smoothing scheme combined with diffusion fine-tuning enables strong certified robustness available with high noise level while maintaining its accuracy closer to non-smoothed classifiers.
    Network Design through Graph Neural Networks: Identifying Challenges and Improving Performance. (arXiv:2310.17100v1 [cs.LG])
    Graph Neural Network (GNN) research has produced strategies to modify a graph's edges using gradients from a trained GNN, with the goal of network design. However, the factors which govern gradient-based editing are understudied, obscuring why edges are chosen and if edits are grounded in an edge's importance. Thus, we begin by analyzing the gradient computation in previous works, elucidating the factors that influence edits and highlighting the potential over-reliance on structural properties. Specifically, we find that edges can achieve high gradients due to structural biases, rather than importance, leading to erroneous edits when the factors are unrelated to the design task. To improve editing, we propose ORE, an iterative editing method that (a) edits the highest scoring edges and (b) re-embeds the edited graph to refresh gradients, leading to less biased edge choices. We empirically study ORE through a set of proposed design tasks, each with an external validation method, demonstrating that ORE improves upon previous methods by up to 50%.
    Technical Note: Feasibility of translating 3.0T-trained Deep-Learning Segmentation Models Out-of-the-Box on Low-Field MRI 0.55T Knee-MRI of Healthy Controls. (arXiv:2310.17152v1 [cs.CV])
    In the current study, our purpose is to evaluate the feasibility of applying deep learning (DL) enabled algorithms to quantify bilateral knee biomarkers in healthy controls scanned at 0.55T and compared with 3.0T. The current study assesses the performance of standard in-practice bone, and cartilage segmentation algorithms at 0.55T, both qualitatively and quantitatively, in terms of comparing segmentation performance, areas of improvement, and compartment-wise cartilage thickness values between 0.55T vs. 3.0T. Initial results demonstrate a usable to good technical feasibility of translating existing quantitative deep-learning-based image segmentation techniques, trained on 3.0T, out of 0.55T for knee MRI, in a multi-vendor acquisition environment. Especially in terms of segmenting cartilage compartments, the models perform almost equivalent to 3.0T in terms of Likert ranking. The 0.55T low-field sustainable and easy-to-install MRI, as demonstrated, thus, can be utilized for evaluating knee cartilage thickness and bone segmentations aided by established DL algorithms trained at higher-field strengths out-of-the-box initially. This could be utilized at the far-spread point-of-care locations with a lack of radiologists available to manually segment low-field images, at least till a decent base of low-field data pool is collated. With further fine-tuning with manual labeling of low-field data or utilizing synthesized higher SNR images from low-field images, OA biomarker quantification performance is potentially guaranteed to be further improved.
    PID-Inspired Inductive Biases for Deep Reinforcement Learning in Partially Observable Control Tasks. (arXiv:2307.05891v2 [cs.LG] UPDATED)
    Deep reinforcement learning (RL) has shown immense potential for learning to control systems through data alone. However, one challenge deep RL faces is that the full state of the system is often not observable. When this is the case, the policy needs to leverage the history of observations to infer the current state. At the same time, differences between the training and testing environments makes it critical for the policy not to overfit to the sequence of observations it sees at training time. As such, there is an important balancing act between having the history encoder be flexible enough to extract relevant information, yet be robust to changes in the environment. To strike this balance, we look to the PID controller for inspiration. We assert the PID controller's success shows that only summing and differencing are needed to accumulate information over time for many control tasks. Following this principle, we propose two architectures for encoding history: one that directly uses PID features and another that extends these core ideas and can be used in arbitrary control tasks. When compared with prior approaches, our encoders produce policies that are often more robust and achieve better performance on a variety of tracking tasks. Going beyond tracking tasks, our policies achieve 1.7x better performance on average over previous state-of-the-art methods on a suite of locomotion control tasks.
    TabR: Tabular Deep Learning Meets Nearest Neighbors in 2023. (arXiv:2307.14338v2 [cs.LG] UPDATED)
    Deep learning (DL) models for tabular data problems (e.g. classification, regression) are currently receiving increasingly more attention from researchers. However, despite the recent efforts, the non-DL algorithms based on gradient-boosted decision trees (GBDT) remain a strong go-to solution for these problems. One of the research directions aimed at improving the position of tabular DL involves designing so-called retrieval-augmented models. For a target object, such models retrieve other objects (e.g. the nearest neighbors) from the available training data and use their features and labels to make a better prediction. In this work, we present TabR -- essentially, a feed-forward network with a custom k-Nearest-Neighbors-like component in the middle. On a set of public benchmarks with datasets up to several million objects, TabR marks a big step forward for tabular DL: it demonstrates the best average performance among tabular DL models, becomes the new state-of-the-art on several datasets, and even outperforms GBDT models on the recently proposed "GBDT-friendly" benchmark (see Figure 1). Among the important findings and technical details powering TabR, the main ones lie in the attention-like mechanism that is responsible for retrieving the nearest neighbors and extracting valuable signal from them. In addition to the much higher performance, TabR is simple and significantly more efficient compared to prior retrieval-based tabular DL models.
    DSAC-C: Constrained Maximum Entropy for Robust Discrete Soft-Actor Critic. (arXiv:2310.17173v1 [cs.LG])
    We present a novel extension to the family of Soft Actor-Critic (SAC) algorithms. We argue that based on the Maximum Entropy Principle, discrete SAC can be further improved via additional statistical constraints derived from a surrogate critic policy. Furthermore, our findings suggests that these constraints provide an added robustness against potential domain shifts, which are essential for safe deployment of reinforcement learning agents in the real-world. We provide theoretical analysis and show empirical results on low data regimes for both in-distribution and out-of-distribution variants of Atari 2600 games.
    miditok: A Python package for MIDI file tokenization. (arXiv:2310.17202v1 [cs.LG])
    Recent progress in natural language processing has been adapted to the symbolic music modality. Language models, such as Transformers, have been used with symbolic music for a variety of tasks among which music generation, modeling or transcription, with state-of-the-art performances. These models are beginning to be used in production products. To encode and decode music for the backbone model, they need to rely on tokenizers, whose role is to serialize music into sequences of distinct elements called tokens. MidiTok is an open-source library allowing to tokenize symbolic music with great flexibility and extended features. It features the most popular music tokenizations, under a unified API. It is made to be easily used and extensible for everyone.
    A Neural Collapse Perspective on Feature Evolution in Graph Neural Networks. (arXiv:2307.01951v2 [cs.LG] UPDATED)
    Graph neural networks (GNNs) have become increasingly popular for classification tasks on graph-structured data. Yet, the interplay between graph topology and feature evolution in GNNs is not well understood. In this paper, we focus on node-wise classification, illustrated with community detection on stochastic block model graphs, and explore the feature evolution through the lens of the "Neural Collapse" (NC) phenomenon. When training instance-wise deep classifiers (e.g. for image classification) beyond the zero training error point, NC demonstrates a reduction in the deepest features' within-class variability and an increased alignment of their class means to certain symmetric structures. We start with an empirical study that shows that a decrease in within-class variability is also prevalent in the node-wise classification setting, however, not to the extent observed in the instance-wise case. Then, we theoretically study this distinction. Specifically, we show that even an "optimistic" mathematical model requires that the graphs obey a strict structural condition in order to possess a minimizer with exact collapse. Interestingly, this condition is viable also for heterophilic graphs and relates to recent empirical studies on settings with improved GNNs' generalization. Furthermore, by studying the gradient dynamics of the theoretical model, we provide reasoning for the partial collapse observed empirically. Finally, we present a study on the evolution of within- and between-class feature variability across layers of a well-trained GNN and contrast the behavior with spectral methods.
    HCT: Hybrid Convnet-Transformer for Parkinson's disease detection and severity prediction from gait. (arXiv:2310.17078v1 [cs.CV])
    In this paper, we propose a novel deep learning method based on a new Hybrid ConvNet-Transformer architecture to detect and stage Parkinson's disease (PD) from gait data. We adopt a two-step approach by dividing the problem into two sub-problems. Our Hybrid ConvNet-Transformer model first distinguishes healthy versus parkinsonian patients. If the patient is parkinsonian, a multi-class Hybrid ConvNet-Transformer model determines the Hoehn and Yahr (H&Y) score to assess the PD severity stage. Our hybrid architecture exploits the strengths of both Convolutional Neural Networks (ConvNets) and Transformers to accurately detect PD and determine the severity stage. In particular, we take advantage of ConvNets to capture local patterns and correlations in the data, while we exploit Transformers for handling long-term dependencies in the input signal. We show that our hybrid method achieves superior performance when compared to other state-of-the-art methods, with a PD detection accuracy of 97% and a severity staging accuracy of 87%. Our source code is available at: https://github.com/SafwenNaimi
    Leveraging Ensemble Diversity for Robust Self-Training in the Presence of Sample Selection Bias. (arXiv:2310.14814v2 [cs.LG] UPDATED)
    Self-training is a well-known approach for semi-supervised learning. It consists of iteratively assigning pseudo-labels to unlabeled data for which the model is confident and treating them as labeled examples. For neural networks, softmax prediction probabilities are often used as a confidence measure, despite the fact that they are known to be overconfident, even for wrong predictions. This phenomenon is particularly intensified in the presence of sample selection bias, i.e., when data labeling is subject to some constraint. To address this issue, we propose a novel confidence measure, called $\mathcal{T}$-similarity, built upon the prediction diversity of an ensemble of linear classifiers. We provide the theoretical analysis of our approach by studying stationary points and describing the relationship between the diversity of the individual members and their performance. We empirically demonstrate the benefit of our confidence measure for three different pseudo-labeling policies on classification datasets of various data modalities.
    Joint Entity and Relation Extraction with Span Pruning and Hypergraph Neural Networks. (arXiv:2310.17238v1 [cs.CL])
    Entity and Relation Extraction (ERE) is an important task in information extraction. Recent marker-based pipeline models achieve state-of-the-art performance, but still suffer from the error propagation issue. Also, most of current ERE models do not take into account higher-order interactions between multiple entities and relations, while higher-order modeling could be beneficial.In this work, we propose HyperGraph neural network for ERE ($\hgnn{}$), which is built upon the PL-marker (a state-of-the-art marker-based pipleline model). To alleviate error propagation,we use a high-recall pruner mechanism to transfer the burden of entity identification and labeling from the NER module to the joint module of our model. For higher-order modeling, we build a hypergraph, where nodes are entities (provided by the span pruner) and relations thereof, and hyperedges encode interactions between two different relations or between a relation and its associated subject and object entities. We then run a hypergraph neural network for higher-order inference by applying message passing over the built hypergraph. Experiments on three widely used benchmarks (\acef{}, \ace{} and \scierc{}) for ERE task show significant improvements over the previous state-of-the-art PL-marker.
    Deep machine learning for meteor monitoring: advances with transfer learning and gradient-weighted class activation mapping. (arXiv:2310.16826v2 [astro-ph.EP] UPDATED)
    In recent decades, the use of optical detection systems for meteor studies has increased dramatically, resulting in huge amounts of data being analyzed. Automated meteor detection tools are essential for studying the continuous meteoroid incoming flux, recovering fresh meteorites, and achieving a better understanding of our Solar System. Concerning meteor detection, distinguishing false positives between meteor and non-meteor images has traditionally been performed by hand, which is significantly time-consuming. To address this issue, we developed a fully automated pipeline that uses Convolutional Neural Networks (CNNs) to classify candidate meteor detections. Our new method is able to detect meteors even in images that contain static elements such as clouds, the Moon, and buildings. To accurately locate the meteor within each frame, we employ the Gradient-weighted Class Activation Mapping (Grad-CAM) technique. This method facilitates the identification of the region of interest by multiplying the activations from the last convolutional layer with the average of the gradients across the feature map of that layer. By combining these findings with the activation map derived from the first convolutional layer, we effectively pinpoint the most probable pixel location of the meteor. We trained and evaluated our model on a large dataset collected by the Spanish Meteor Network (SPMN) and achieved a precision of 98\%. Our new methodology presented here has the potential to reduce the workload of meteor scientists and station operators and improve the accuracy of meteor tracking and classification.
    CEIL: Generalized Contextual Imitation Learning. (arXiv:2306.14534v2 [cs.LG] UPDATED)
    In this paper, we present \textbf{C}ont\textbf{E}xtual \textbf{I}mitation \textbf{L}earning~(CEIL), a general and broadly applicable algorithm for imitation learning (IL). Inspired by the formulation of hindsight information matching, we derive CEIL by explicitly learning a hindsight embedding function together with a contextual policy using the hindsight embeddings. To achieve the expert matching objective for IL, we advocate for optimizing a contextual variable such that it biases the contextual policy towards mimicking expert behaviors. Beyond the typical learning from demonstrations (LfD) setting, CEIL is a generalist that can be effectively applied to multiple settings including: 1)~learning from observations (LfO), 2)~offline IL, 3)~cross-domain IL (mismatched experts), and 4) one-shot IL settings. Empirically, we evaluate CEIL on the popular MuJoCo tasks (online) and the D4RL dataset (offline). Compared to prior state-of-the-art baselines, we show that CEIL is more sample-efficient in most online IL tasks and achieves better or competitive performances in offline tasks.
    Detecting and Mitigating Hallucinations in Multilingual Summarisation. (arXiv:2305.13632v2 [cs.CL] UPDATED)
    Hallucinations pose a significant challenge to the reliability of neural models for abstractive summarisation. While automatically generated summaries may be fluent, they often lack faithfulness to the original document. This issue becomes even more pronounced in low-resource settings, such as cross-lingual transfer. With the existing faithful metrics focusing on English, even measuring the extent of this phenomenon in cross-lingual settings is hard. To address this, we first develop a novel metric, mFACT, evaluating the faithfulness of non-English summaries, leveraging translation-based transfer from multiple English faithfulness metrics. We then propose a simple but effective method to reduce hallucinations with a cross-lingual transfer, which weighs the loss of each training example by its faithfulness score. Through extensive experiments in multiple languages, we demonstrate that mFACT is the metric that is most suited to detect hallucinations. Moreover, we find that our proposed loss weighting method drastically increases both performance and faithfulness according to both automatic and human evaluation when compared to strong baselines for cross-lingual transfer such as MAD-X. Our code and dataset are available at https://github.com/yfqiu-nlp/mfact-summ.
    Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations. (arXiv:2306.04618v2 [cs.CL] UPDATED)
    This paper reexamines the research on out-of-distribution (OOD) robustness in the field of NLP. We find that the distribution shift settings in previous studies commonly lack adequate challenges, hindering the accurate evaluation of OOD robustness. To address these issues, we propose a benchmark construction protocol that ensures clear differentiation and challenging distribution shifts. Then we introduce BOSS, a Benchmark suite for Out-of-distribution robustneSS evaluation covering 5 tasks and 20 datasets. Based on BOSS, we conduct a series of experiments on pre-trained language models for analysis and evaluation of OOD robustness. First, for vanilla fine-tuning, we examine the relationship between in-distribution (ID) and OOD performance. We identify three typical types that unveil the inner learning mechanism, which could potentially facilitate the forecasting of OOD robustness, correlating with the advancements on ID datasets. Then, we evaluate 5 classic methods on BOSS and find that, despite exhibiting some effectiveness in specific cases, they do not offer significant improvement compared to vanilla fine-tuning. Further, we evaluate 5 LLMs with various adaptation paradigms and find that when sufficient ID data is available, fine-tuning domain-specific models outperform LLMs on ID examples significantly. However, in the case of OOD instances, prioritizing LLMs with in-context learning yields better results. We identify that both fine-tuned small models and LLMs face challenges in effectively addressing downstream tasks. The code is public at \url{https://github.com/lifan-yuan/OOD_NLP}.
    Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory. (arXiv:2307.04204v2 [cs.LG] UPDATED)
    Cohen et al. (2021) empirically study the evolution of the largest eigenvalue of the loss Hessian, also known as sharpness, along the gradient descent (GD) trajectory and observe the Edge of Stability (EoS) phenomenon. The sharpness increases at the early phase of training (referred to as progressive sharpening), and eventually saturates close to the threshold of $2 / \text{(step size)}$. In this paper, we start by demonstrating through empirical studies that when the EoS phenomenon occurs, different GD trajectories (after a proper reparameterization) align on a specific bifurcation diagram independent of initialization. We then rigorously prove this trajectory alignment phenomenon for a two-layer fully-connected linear network and a single-neuron nonlinear network trained with a single data point. Our trajectory alignment analysis establishes both progressive sharpening and EoS phenomena, encompassing and extending recent findings in the literature.
    Distribution-Free Model-Agnostic Regression Calibration via Nonparametric Methods. (arXiv:2305.12283v2 [cs.LG] UPDATED)
    In this paper, we consider the uncertainty quantification problem for regression models. Specifically, we consider an individual calibration objective for characterizing the quantiles of the prediction model. While such an objective is well-motivated from downstream tasks such as newsvendor cost, the existing methods have been largely heuristic and lack of statistical guarantee in terms of individual calibration. We show via simple examples that the existing methods focusing on population-level calibration guarantees such as average calibration or sharpness can lead to harmful and unexpected results. We propose simple nonparametric calibration methods that are agnostic of the underlying prediction model and enjoy both computational efficiency and statistical consistency. Our approach enables a better understanding of the possibility of individual calibration, and we establish matching upper and lower bounds for the calibration error of our proposed methods. Technically, our analysis combines the nonparametric analysis with a covering number argument for parametric analysis, which advances the existing theoretical analyses in the literature of nonparametric density estimation and quantile bandit problems. Importantly, the nonparametric perspective sheds new theoretical insights into regression calibration in terms of the curse of dimensionality and reconciles the existing results on the impossibility of individual calibration. To our knowledge, we make the first effort to reach both individual calibration and finite-sample guarantee with minimal assumptions in terms of conformal prediction. Numerical experiments show the advantage of such a simple approach under various metrics, and also under covariates shift. We hope our work provides a simple benchmark and a starting point of theoretical ground for future research on regression calibration.
    Neural (Tangent Kernel) Collapse. (arXiv:2305.16427v2 [cs.LG] UPDATED)
    This work bridges two important concepts: the Neural Tangent Kernel (NTK), which captures the evolution of deep neural networks (DNNs) during training, and the Neural Collapse (NC) phenomenon, which refers to the emergence of symmetry and structure in the last-layer features of well-trained classification DNNs. We adopt the natural assumption that the empirical NTK develops a block structure aligned with the class labels, i.e., samples within the same class have stronger correlations than samples from different classes. Under this assumption, we derive the dynamics of DNNs trained with mean squared (MSE) loss and break them into interpretable phases. Moreover, we identify an invariant that captures the essence of the dynamics, and use it to prove the emergence of NC in DNNs with block-structured NTK. We provide large-scale numerical experiments on three common DNN architectures and three benchmark datasets to support our theory.
    A Batch-to-Online Transformation under Random-Order Model. (arXiv:2306.07163v2 [cs.LG] UPDATED)
    We introduce a transformation framework that can be utilized to develop online algorithms with low $\epsilon$-approximate regret in the random-order model from offline approximation algorithms. We first give a general reduction theorem that transforms an offline approximation algorithm with low average sensitivity to an online algorithm with low $\epsilon$-approximate regret. We then demonstrate that offline approximation algorithms can be transformed into a low-sensitivity version using a coreset construction method. To showcase the versatility of our approach, we apply it to various problems, including online $(k,z)$-clustering, online matrix approximation, and online regression, and successfully achieve polylogarithmic $\epsilon$-approximate regret for each problem. Moreover, we show that in all three cases, our algorithm also enjoys low inconsistency, which may be desired in some online applications.
    A Comprehensive Study of Groundbreaking Machine Learning Research: Analyzing highly cited and impactful publications across six decades. (arXiv:2308.00855v2 [cs.DL] UPDATED)
    Machine learning (ML) has emerged as a prominent field of research in computer science and other related fields, thereby driving advancements in other domains of interest. As the field continues to evolve, it is crucial to understand the landscape of highly cited publications to identify key trends, influential authors, and significant contributions made thus far. In this paper, we present a comprehensive bibliometric analysis of highly cited ML publications. We collected a dataset consisting of the top-cited papers from reputable ML conferences and journals, covering a period of several years from 1959 to 2022. We employed various bibliometric techniques to analyze the data, including citation analysis, co-authorship analysis, keyword analysis, and publication trends. Our findings reveal the most influential papers, highly cited authors, and collaborative networks within the machine learning community. We identify popular research themes and uncover emerging topics that have recently gained significant attention. Furthermore, we examine the geographical distribution of highly cited publications, highlighting the dominance of certain countries in ML research. By shedding light on the landscape of highly cited ML publications, our study provides valuable insights for researchers, policymakers, and practitioners seeking to understand the key developments and trends in this rapidly evolving field.
    On Performance Discrepancies Across Local Homophily Levels in Graph Neural Networks. (arXiv:2306.05557v3 [cs.SI] UPDATED)
    Graph Neural Network (GNN) research has highlighted a relationship between high homophily (i.e., the tendency of nodes of the same class to connect) and strong predictive performance in node classification. However, recent work has found the relationship to be more nuanced, demonstrating that simple GNNs can learn in certain heterophilous settings. To resolve these conflicting findings and align closer to real-world datasets, we go beyond the assumption of a global graph homophily level and study the performance of GNNs when the local homophily level of a node deviates from the global homophily level. Through theoretical and empirical analysis, we systematically demonstrate how shifts in local homophily can introduce performance degradation, leading to performance discrepancies across local homophily levels. We ground the practical implications of this work through granular analysis on five real-world datasets with varying global homophily levels, demonstrating that (a) GNNs can fail to generalize to test nodes that deviate from the global homophily of a graph, and (b) high local homophily does not necessarily confer high performance for a node. We further show that GNNs designed for globally heterophilous graphs can alleviate performance discrepancy by improving performance across local homophily levels, offering a new perspective on how these GNNs achieve stronger global performance.
    Training-free Diffusion Model Adaptation for Variable-Sized Text-to-Image Synthesis. (arXiv:2306.08645v2 [cs.CV] UPDATED)
    Diffusion models (DMs) have recently gained attention with state-of-the-art performance in text-to-image synthesis. Abiding by the tradition in deep learning, DMs are trained and evaluated on the images with fixed sizes. However, users are demanding for various images with specific sizes and various aspect ratio. This paper focuses on adapting text-to-image diffusion models to handle such variety while maintaining visual fidelity. First we observe that, during the synthesis, lower resolution images suffer from incomplete object portrayal, while higher resolution images exhibit repetitively disordered presentation. Next, we establish a statistical relationship indicating that attention entropy changes with token quantity, suggesting that models aggregate spatial information in proportion to image resolution. The subsequent interpretation on our observations is that objects are incompletely depicted due to limited spatial information for low resolutions, while repetitively disorganized presentation arises from redundant spatial information for high resolutions. From this perspective, we propose a scaling factor to alleviate the change of attention entropy and mitigate the defective pattern observed. Extensive experimental results validate the efficacy of the proposed scaling factor, enabling models to achieve better visual effects, image quality, and text alignment. Notably, these improvements are achieved without additional training or fine-tuning techniques.
    Improving Multimodal Datasets with Image Captioning. (arXiv:2307.10350v2 [cs.LG] UPDATED)
    Massive web datasets play a key role in the success of large vision-language models like CLIP and Flamingo. However, the raw web data is noisy, and existing filtering methods to reduce noise often come at the expense of data diversity. Our work focuses on caption quality as one major source of noise, and studies how generated captions can increase the utility of web-scraped datapoints with nondescript text. Through exploring different mixing strategies for raw and generated captions, we outperform the best filtering method proposed by the DataComp benchmark by 2% on ImageNet and 4% on average across 38 tasks, given a candidate pool of 128M image-text pairs. Our best approach is also 2x better at Flickr and MS-COCO retrieval. We then analyze what makes synthetic captions an effective source of text supervision. In experimenting with different image captioning models, we also demonstrate that the performance of a model on standard image captioning benchmarks (e.g., NoCaps CIDEr) is not a reliable indicator of the utility of the captions it generates for multimodal training. Finally, our experiments with using generated captions at DataComp's large scale (1.28B image-text pairs) offer insights into the limitations of synthetic text, as well as the importance of image curation with increasing training data quantity. The synthetic captions used in our experiments are now available on HuggingFace.
    Small Total-Cost Constraints in Contextual Bandits with Knapsacks, with Application to Fairness. (arXiv:2305.15807v2 [stat.ML] UPDATED)
    We consider contextual bandit problems with knapsacks [CBwK], a problem where at each round, a scalar reward is obtained and vector-valued costs are suffered. The learner aims to maximize the cumulative rewards while ensuring that the cumulative costs are lower than some predetermined cost constraints. We assume that contexts come from a continuous set, that costs can be signed, and that the expected reward and cost functions, while unknown, may be uniformly estimated -- a typical assumption in the literature. In this setting, total cost constraints had so far to be at least of order $T^{3/4}$, where $T$ is the number of rounds, and were even typically assumed to depend linearly on $T$. We are however motivated to use CBwK to impose a fairness constraint of equalized average costs between groups: the budget associated with the corresponding cost constraints should be as close as possible to the natural deviations, of order $\sqrt{T}$. To that end, we introduce a dual strategy based on projected-gradient-descent updates, that is able to deal with total-cost constraints of the order of $\sqrt{T}$ up to poly-logarithmic terms. This strategy is more direct and simpler than existing strategies in the literature. It relies on a careful, adaptive, tuning of the step size.
    DSAC-T: Distributional Soft Actor-Critic with Three Refinements. (arXiv:2310.05858v3 [cs.LG] UPDATED)
    Reinforcement learning (RL) has proven to be highly effective in tackling complex decision-making and control tasks. However, prevalent model-free RL methods often face severe performance degradation due to the well-known overestimation issue. In response to this problem, we recently introduced an off-policy RL algorithm, called distributional soft actor-critic (DSAC or DSAC-v1), which can effectively improve the value estimation accuracy by learning a continuous Gaussian value distribution. Nonetheless, standard DSAC has its own shortcomings, including occasionally unstable learning processes and needs for task-specific reward scaling, which may hinder its overall performance and adaptability in some special tasks. This paper further introduces three important refinements to standard DSAC in order to address these shortcomings. These refinements consist of critic gradient adjusting, twin value distribution learning, and variance-based target return clipping. The modified RL algorithm is named as DSAC with three refinements (DSAC-T or DSAC-v2), and its performances are systematically evaluated on a diverse set of benchmark tasks. Without any task-specific hyperparameter tuning, DSAC-T surpasses a lot of mainstream model-free RL algorithms, including SAC, TD3, DDPG, TRPO, and PPO, in all tested environments. Additionally, DSAC-T, unlike its standard version, ensures a highly stable learning process and delivers similar performance across varying reward scales.
    Driving through the Concept Gridlock: Unraveling Explainability Bottlenecks in Automated Driving. (arXiv:2310.16639v2 [cs.CV] UPDATED)
    Concept bottleneck models have been successfully used for explainable machine learning by encoding information within the model with a set of human-defined concepts. In the context of human-assisted or autonomous driving, explainability models can help user acceptance and understanding of decisions made by the autonomous vehicle, which can be used to rationalize and explain driver or vehicle behavior. We propose a new approach using concept bottlenecks as visual features for control command predictions and explanations of user and vehicle behavior. We learn a human-understandable concept layer that we use to explain sequential driving scenes while learning vehicle control commands. This approach can then be used to determine whether a change in a preferred gap or steering commands from a human (or autonomous vehicle) is led by an external stimulus or change in preferences. We achieve competitive performance to latent visual features while gaining interpretability within our model setup.
    Mind the spikes: Benign overfitting of kernels and neural networks in fixed dimension. (arXiv:2305.14077v2 [stat.ML] UPDATED)
    The success of over-parameterized neural networks trained to near-zero training error has caused great interest in the phenomenon of benign overfitting, where estimators are statistically consistent even though they interpolate noisy training data. While benign overfitting in fixed dimension has been established for some learning methods, current literature suggests that for regression with typical kernel methods and wide neural networks, benign overfitting requires a high-dimensional setting where the dimension grows with the sample size. In this paper, we show that the smoothness of the estimators, and not the dimension, is the key: benign overfitting is possible if and only if the estimator's derivatives are large enough. We generalize existing inconsistency results to non-interpolating models and more kernels to show that benign overfitting with moderate derivatives is impossible in fixed dimension. Conversely, we show that rate-optimal benign overfitting is possible for regression with a sequence of spiky-smooth kernels with large derivatives. Using neural tangent kernels, we translate our results to wide neural networks. We prove that while infinite-width networks do not overfit benignly with the ReLU activation, this can be fixed by adding small high-frequency fluctuations to the activation function. Our experiments verify that such neural networks, while overfitting, can indeed generalize well even on low-dimensional data sets.
    SEEDS: Exponential SDE Solvers for Fast High-Quality Sampling from Diffusion Models. (arXiv:2305.14267v2 [cs.LG] UPDATED)
    A potent class of generative models known as Diffusion Probabilistic Models (DPMs) has become prominent. A forward diffusion process adds gradually noise to data, while a model learns to gradually denoise. Sampling from pre-trained DPMs is obtained by solving differential equations (DE) defined by the learnt model, a process which has shown to be prohibitively slow. Numerous efforts on speeding-up this process have consisted on crafting powerful ODE solvers. Despite being quick, such solvers do not usually reach the optimal quality achieved by available slow SDE solvers. Our goal is to propose SDE solvers that reach optimal quality without requiring several hundreds or thousands of NFEs to achieve that goal. We propose Stochastic Explicit Exponential Derivative-free Solvers (SEEDS), improving and generalizing Exponential Integrator approaches to the stochastic case on several frameworks. After carefully analyzing the formulation of exact solutions of diffusion SDEs, we craft SEEDS to analytically compute the linear part of such solutions. Inspired by the Exponential Time-Differencing method, SEEDS use a novel treatment of the stochastic components of solutions, enabling the analytical computation of their variance, and contains high-order terms allowing to reach optimal quality sampling $\sim3$-$5\times$ faster than previous SDE methods. We validate our approach on several image generation benchmarks, showing that SEEDS outperform or are competitive with previous SDE solvers. Contrary to the latter, SEEDS are derivative and training free, and we fully prove strong convergence guarantees for them.
    Gaussian Membership Inference Privacy. (arXiv:2306.07273v2 [cs.LG] UPDATED)
    We propose a novel and practical privacy notion called $f$-Membership Inference Privacy ($f$-MIP), which explicitly considers the capabilities of realistic adversaries under the membership inference attack threat model. Consequently, $f$-MIP offers interpretable privacy guarantees and improved utility (e.g., better classification accuracy). In particular, we derive a parametric family of $f$-MIP guarantees that we refer to as $\mu$-Gaussian Membership Inference Privacy ($\mu$-GMIP) by theoretically analyzing likelihood ratio-based membership inference attacks on stochastic gradient descent (SGD). Our analysis highlights that models trained with standard SGD already offer an elementary level of MIP. Additionally, we show how $f$-MIP can be amplified by adding noise to gradient updates. Our analysis further yields an analytical membership inference attack that offers two distinct advantages over previous approaches. First, unlike existing state-of-the-art attacks that require training hundreds of shadow models, our attack does not require any shadow model. Second, our analytical attack enables straightforward auditing of our privacy notion $f$-MIP. Finally, we quantify how various hyperparameters (e.g., batch size, number of model parameters) and specific data characteristics determine an attacker's ability to accurately infer a point's membership in the training set. We demonstrate the effectiveness of our method on models trained on vision and tabular datasets.
    Statistical Component Separation for Targeted Signal Recovery in Noisy Mixtures. (arXiv:2306.15012v2 [stat.ML] UPDATED)
    Separating signals from an additive mixture may be an unnecessarily hard problem when one is only interested in specific properties of a given signal. In this work, we tackle simpler "statistical component separation" problems that focus on recovering a predefined set of statistical descriptors of a target signal from a noisy mixture. Assuming access to samples of the noise process, we investigate a method devised to match the statistics of the solution candidate corrupted by noise samples with those of the observed mixture. We first analyze the behavior of this method using simple examples with analytically tractable calculations. Then, we apply it in an image denoising context employing 1) wavelet-based descriptors, 2) ConvNet-based descriptors on astrophysics and ImageNet data. In the case of 1), we show that our method better recovers the descriptors of the target data than a standard denoising method in most situations. Additionally, despite not constructed for this purpose, it performs surprisingly well in terms of peak signal-to-noise ratio on full signal reconstruction. In comparison, representation 2) appears less suitable for image denoising. Finally, we extend this method by introducing a diffusive stepwise algorithm which gives a new perspective to the initial method and leads to promising results for image denoising under specific circumstances.
    Adaptive important sampling for Deep Ritz. (arXiv:2310.17185v1 [cs.LG])
    We introduce an adaptive sampling method for the Deep Ritz method aimed at solving partial differential equations (PDEs). Two deep neural networks are used. One network is employed to approximate the solution of PDEs, while the other one is a deep generative model used to generate new collocation points to refine the training set. The adaptive sampling procedure consists of two main steps. The first step is solving the PDEs using the Deep Ritz method by minimizing an associated variational loss discretized by the collocation points in the training set. The second step involves generating a new training set, which is then used in subsequent computations to further improve the accuracy of the current approximate solution. We treat the integrand in the variational loss as an unnormalized probability density function (PDF) and approximate it using a deep generative model called bounded KRnet. The new samples and their associated PDF values are obtained from the bounded KRnet. With these new samples and their associated PDF values, the variational loss can be approximated more accurately by importance sampling. Compared to the original Deep Ritz method, the proposed adaptive method improves accuracy, especially for problems characterized by low regularity and high dimensionality. We demonstrate the effectiveness of our new method through a series of numerical experiments.
    The Wasserstein Believer: Learning Belief Updates for Partially Observable Environments through Reliable Latent Space Models. (arXiv:2303.03284v3 [cs.LG] UPDATED)
    Partially Observable Markov Decision Processes (POMDPs) are used to model environments where the full state cannot be perceived by an agent. As such the agent needs to reason taking into account the past observations and actions. However, simply remembering the full history is generally intractable due to the exponential growth in the history space. Maintaining a probability distribution that models the belief over what the true state is can be used as a sufficient statistic of the history, but its computation requires access to the model of the environment and is often intractable. While SOTA algorithms use Recurrent Neural Networks to compress the observation-action history aiming to learn a sufficient statistic, they lack guarantees of success and can lead to sub-optimal policies. To overcome this, we propose the Wasserstein Belief Updater, an RL algorithm that learns a latent model of the POMDP and an approximation of the belief update. Our approach comes with theoretical guarantees on the quality of our approximation ensuring that our outputted beliefs allow for learning the optimal value function.
    RS-Del: Edit Distance Robustness Certificates for Sequence Classifiers via Randomized Deletion. (arXiv:2302.01757v2 [cs.CR] UPDATED)
    Randomized smoothing is a leading approach for constructing classifiers that are certifiably robust against adversarial examples. Existing work on randomized smoothing has focused on classifiers with continuous inputs, such as images, where $\ell_p$-norm bounded adversaries are commonly studied. However, there has been limited work for classifiers with discrete or variable-size inputs, such as for source code, which require different threat models and smoothing mechanisms. In this work, we adapt randomized smoothing for discrete sequence classifiers to provide certified robustness against edit distance-bounded adversaries. Our proposed smoothing mechanism randomized deletion (RS-Del) applies random deletion edits, which are (perhaps surprisingly) sufficient to confer robustness against adversarial deletion, insertion and substitution edits. Our proof of certification deviates from the established Neyman-Pearson approach, which is intractable in our setting, and is instead organized around longest common subsequences. We present a case study on malware detection--a binary classification problem on byte sequences where classifier evasion is a well-established threat model. When applied to the popular MalConv malware detection model, our smoothing mechanism RS-Del achieves a certified accuracy of 91% at an edit distance radius of 128 bytes.
    Investigating Topological Order using Recurrent Neural Networks. (arXiv:2303.11207v3 [cond-mat.str-el] UPDATED)
    Recurrent neural networks (RNNs), originally developed for natural language processing, hold great promise for accurately describing strongly correlated quantum many-body systems. Here, we employ 2D RNNs to investigate two prototypical quantum many-body Hamiltonians exhibiting topological order. Specifically, we demonstrate that RNN wave functions can effectively capture the topological order of the toric code and a Bose-Hubbard spin liquid on the kagome lattice by estimating their topological entanglement entropies. We also find that RNNs favor coherent superpositions of minimally-entangled states over minimally-entangled states themselves. Overall, our findings demonstrate that RNN wave functions constitute a powerful tool to study phases of matter beyond Landau's symmetry-breaking paradigm.
    What Makes Data Suitable for a Locally Connected Neural Network? A Necessary and Sufficient Condition Based on Quantum Entanglement. (arXiv:2303.11249v4 [cs.LG] UPDATED)
    The question of what makes a data distribution suitable for deep learning is a fundamental open problem. Focusing on locally connected neural networks (a prevalent family of architectures that includes convolutional and recurrent neural networks as well as local self-attention models), we address this problem by adopting theoretical tools from quantum physics. Our main theoretical result states that a certain locally connected neural network is capable of accurate prediction over a data distribution if and only if the data distribution admits low quantum entanglement under certain canonical partitions of features. As a practical application of this result, we derive a preprocessing method for enhancing the suitability of a data distribution to locally connected neural networks. Experiments with widespread models over various datasets demonstrate our findings. We hope that our use of quantum entanglement will encourage further adoption of tools from physics for formally reasoning about the relation between deep learning and real-world data.
    Self-Evaluation Guided Beam Search for Reasoning. (arXiv:2305.00633v3 [cs.CL] UPDATED)
    Breaking down a problem into intermediate steps has demonstrated impressive performance in Large Language Model (LLM) reasoning. However, the growth of the reasoning chain introduces uncertainty and error accumulation, making it challenging to elicit accurate final results. To tackle this challenge of uncertainty in multi-step reasoning, we introduce a stepwise self-evaluation mechanism to guide and calibrate the reasoning process of LLMs. We propose a decoding algorithm integrating the self-evaluation guidance via stochastic beam search. The self-evaluation guidance serves as a better-calibrated automatic criterion, facilitating an efficient search in the reasoning space and resulting in superior prediction quality. Stochastic beam search balances exploitation and exploration of the search space with temperature-controlled randomness. Our approach surpasses the corresponding Codex-backboned baselines in few-shot accuracy by $6.34\%$, $9.56\%$, and $5.46\%$ on the GSM8K, AQuA, and StrategyQA benchmarks, respectively. Experiment results with Llama-2 on arithmetic reasoning demonstrate the efficiency of our method in outperforming the baseline methods with comparable computational budgets. Further analysis in multi-step reasoning finds our self-evaluation guidance pinpoints logic failures and leads to higher consistency and robustness. Our code is publicly available at https://guideddecoding.github.io/.
    Improving the Timing Resolution of Positron Emission Tomography Detectors Using Boosted Learning -- A Residual Physics Approach. (arXiv:2302.01681v2 [cs.LG] UPDATED)
    Artificial intelligence (AI) is entering medical imaging, mainly enhancing image reconstruction. Nevertheless, improvements throughout the entire processing, from signal detection to computation, potentially offer significant benefits. This work presents a novel and versatile approach to detector optimization using machine learning (ML) and residual physics. We apply the concept to positron emission tomography (PET), intending to improve the coincidence time resolution (CTR). PET visualizes metabolic processes in the body by detecting photons with scintillation detectors. Improved CTR performance offers the advantage of reducing radioactive dose exposure for patients. Modern PET detectors with sophisticated concepts and read-out topologies represent complex physical and electronic systems requiring dedicated calibration techniques. Traditional methods primarily depend on analytical formulations successfully describing the main detector characteristics. However, when accounting for higher-order effects, additional complexities arise matching theoretical models to experimental reality. Our work addresses this challenge by combining traditional calibration with AI and residual physics, presenting a highly promising approach. We present a residual physics-based strategy using gradient tree boosting and physics-guided data generation. The explainable AI framework SHapley Additive exPlanations (SHAP) was used to identify known physical effects with learned patterns. In addition, the models were tested against basic physical laws. We were able to improve the CTR significantly (more than 20%) for clinically relevant detectors of 19 mm height, reaching CTRs of 185 ps (450-550 keV).
    Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation. (arXiv:2305.11685v2 [eess.AS] UPDATED)
    Transformer-based speech self-supervised learning (SSL) models, such as HuBERT, show surprising performance in various speech processing tasks. However, huge number of parameters in speech SSL models necessitate the compression to a more compact model for wider usage in academia or small companies. In this study, we suggest to reuse attention maps across the Transformer layers, so as to remove key and query parameters while retaining the number of layers. Furthermore, we propose a novel masking distillation strategy to improve the student model's speech representation quality. We extend the distillation loss to utilize both masked and unmasked speech frames to fully leverage the teacher model's high-quality representation. Our universal compression strategy yields the student model that achieves phoneme error rate (PER) of 7.72% and word error rate (WER) of 9.96% on the SUPERB benchmark.
    Large-Scale Gaussian Processes via Alternating Projection. (arXiv:2310.17137v1 [cs.LG])
    Gaussian process (GP) hyperparameter optimization requires repeatedly solving linear systems with $n \times n$ kernel matrices. To address the prohibitive $\mathcal{O}(n^3)$ time complexity, recent work has employed fast iterative numerical methods, like conjugate gradients (CG). However, as datasets increase in magnitude, the corresponding kernel matrices become increasingly ill-conditioned and still require $\mathcal{O}(n^2)$ space without partitioning. Thus, while CG increases the size of datasets GPs can be trained on, modern datasets reach scales beyond its applicability. In this work, we propose an iterative method which only accesses subblocks of the kernel matrix, effectively enabling \emph{mini-batching}. Our algorithm, based on alternating projection, has $\mathcal{O}(n)$ per-iteration time and space complexity, solving many of the practical challenges of scaling GPs to very large datasets. Theoretically, we prove our method enjoys linear convergence and empirically we demonstrate its robustness to ill-conditioning. On large-scale benchmark datasets up to four million datapoints our approach accelerates training by a factor of 2$\times$ to 27$\times$ compared to CG.
    Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning. (arXiv:2310.17139v1 [cs.LG])
    While bisimulation-based approaches hold promise for learning robust state representations for Reinforcement Learning (RL) tasks, their efficacy in offline RL tasks has not been up to par. In some instances, their performance has even significantly underperformed alternative methods. We aim to understand why bisimulation methods succeed in online settings, but falter in offline tasks. Our analysis reveals that missing transitions in the dataset are particularly harmful to the bisimulation principle, leading to ineffective estimation. We also shed light on the critical role of reward scaling in bounding the scale of bisimulation measurements and of the value error they induce. Based on these findings, we propose to apply the expectile operator for representation learning to our offline RL setting, which helps to prevent overfitting to incomplete data. Meanwhile, by introducing an appropriate reward scaling strategy, we avoid the risk of feature collapse in representation space. We implement these recommendations on two state-of-the-art bisimulation-based algorithms, MICo and SimSR, and demonstrate performance gains on two benchmark suites: D4RL and Visual D4RL. Codes are provided at \url{https://github.com/zanghyu/Offline_Bisimulation}.
    Beyond MLE: Convex Learning for Text Generation. (arXiv:2310.17217v1 [cs.CL])
    Maximum likelihood estimation (MLE) is a statistical method used to estimate the parameters of a probability distribution that best explain the observed data. In the context of text generation, MLE is often used to train generative language models, which can then be used to generate new text. However, we argue that MLE is not always necessary and optimal, especially for closed-ended text generation tasks like machine translation. In these tasks, the goal of model is to generate the most appropriate response, which does not necessarily require it to estimate the entire data distribution with MLE. To this end, we propose a novel class of training objectives based on convex functions, which enables text generation models to focus on highly probable outputs without having to estimate the entire data distribution. We investigate the theoretical properties of the optimal predicted distribution when applying convex functions to the loss, demonstrating that convex functions can sharpen the optimal distribution, thereby enabling the model to better capture outputs with high probabilities. Experiments on various text generation tasks and models show the effectiveness of our approach. It enables autoregressive models to bridge the gap between greedy and beam search, and facilitates the learning of non-autoregressive models with a maximum improvement of 9+ BLEU points. Moreover, our approach also exhibits significant impact on large language models (LLMs), substantially enhancing their generative capability on various tasks. Source code is available at \url{https://github.com/ictnlp/Convex-Learning}.
    Fairness and bias correction in machine learning for depression prediction: results from four study populations. (arXiv:2211.05321v3 [cs.LG] UPDATED)
    A significant level of stigma and inequality exists in mental healthcare, especially in under-served populations. Inequalities are reflected in the data collected for scientific purposes. When not properly accounted for, machine learning (ML) models leart from data can reinforce these structural inequalities or biases. Here, we present a systematic study of bias in ML models designed to predict depression in four different case studies covering different countries and populations. We find that standard ML approaches show regularly biased behaviors. We also show that mitigation techniques, both standard and our own post-hoc method, can be effective in reducing the level of unfair bias. No single best ML model for depression prediction provides equality of outcomes. This emphasizes the importance of analyzing fairness during model selection and transparent reporting about the impact of debiasing interventions. Finally, we provide practical recommendations to develop bias-aware ML models for depression risk prediction.
    Learning Rate Free Bayesian Inference in Constrained Domains. (arXiv:2305.14943v2 [stat.ML] UPDATED)
    We introduce a suite of new particle-based algorithms for sampling on constrained domains which are entirely learning rate free. Our approach leverages coin betting ideas from convex optimisation, and the viewpoint of constrained sampling as a mirrored optimisation problem on the space of probability measures. Based on this viewpoint, we also introduce a unifying framework for several existing constrained sampling algorithms, including mirrored Langevin dynamics and mirrored Stein variational gradient descent. We demonstrate the performance of our algorithms on a range of numerical examples, including sampling from targets on the simplex, sampling with fairness constraints, and constrained sampling problems in post-selection inference. Our results indicate that our algorithms achieve competitive performance with existing constrained sampling methods, without the need to tune any hyperparameters.
    Simulation based stacking. (arXiv:2310.17009v1 [stat.ME])
    Simulation-based inference has been popular for amortized Bayesian computation. It is typical to have more than one posterior approximation, from different inference algorithms, different architectures, or simply the randomness of initialization and stochastic gradients. With a provable asymptotic guarantee, we present a general stacking framework to make use of all available posterior approximations. Our stacking method is able to combine densities, simulation draws, confidence intervals, and moments, and address the overall precision, calibration, coverage, and bias at the same time. We illustrate our method on several benchmark simulations and a challenging cosmological inference task.
    Automating lichen monitoring in ecological studies using instance segmentation of time-lapse images. (arXiv:2310.17080v1 [cs.CV])
    Lichens are symbiotic organisms composed of fungi, algae, and/or cyanobacteria that thrive in a variety of environments. They play important roles in carbon and nitrogen cycling, and contribute directly and indirectly to biodiversity. Ecologists typically monitor lichens by using them as indicators to assess air quality and habitat conditions. In particular, epiphytic lichens, which live on trees, are key markers of air quality and environmental health. A new method of monitoring epiphytic lichens involves using time-lapse cameras to gather images of lichen populations. These cameras are used by ecologists in Newfoundland and Labrador to subsequently analyze and manually segment the images to determine lichen thalli condition and change. These methods are time-consuming and susceptible to observer bias. In this work, we aim to automate the monitoring of lichens over extended periods and to estimate their biomass and condition to facilitate the task of ecologists. To accomplish this, our proposed framework uses semantic segmentation with an effective training approach to automate monitoring and biomass estimation of epiphytic lichens on time-lapse images. We show that our method has the potential to significantly improve the accuracy and efficiency of lichen population monitoring, making it a valuable tool for forest ecologists and environmental scientists to evaluate the impact of climate change on Canada's forests. To the best of our knowledge, this is the first time that such an approach has been used to assist ecologists in monitoring and analyzing epiphytic lichens.
    ZipLM: Inference-Aware Structured Pruning of Language Models. (arXiv:2302.04089v2 [cs.LG] UPDATED)
    The breakthrough performance of large language models (LLMs) comes with major computational footprints and high deployment costs. In this paper, we progress towards resolving this problem by proposing a novel structured compression approach for LLMs, called ZipLM. ZipLM achieves state-of-the-art accuracy-vs-speedup, while matching a set of desired target runtime speedups in any given inference environment. Specifically, given a model, a dataset, an inference environment, as well as a set of speedup targets, ZipLM iteratively identifies and removes components with the worst loss-runtime trade-off. Unlike prior methods that specialize in either the post-training/one-shot or the gradual compression setting, and only for specific families of models such as BERT (encoder) or GPT (decoder), ZipLM produces state-of-the-art compressed models across all these settings. Furthermore, ZipLM achieves superior results for a fraction of the computational cost relative to prior distillation and pruning techniques, making it a cost-effective approach for generating an entire family of smaller, faster, and highly accurate models, guaranteed to meet the desired inference specifications. In particular, ZipLM outperforms all prior BERT-base distillation and pruning techniques, such as CoFi, MiniLM, and TinyBERT. Moreover, it matches the performance of the heavily optimized MobileBERT model, obtained via extensive architecture search, by simply pruning the baseline BERT-large model. When compressing GPT2, ZipLM outperforms DistilGPT2 while being 60% smaller and 30% faster. Our code is available at: https://github.com/IST-DASLab/ZipLM.
    Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult. (arXiv:2310.17087v1 [cs.LG])
    Large learning rates, when applied to gradient descent for nonconvex optimization, yield various implicit biases including the edge of stability (Cohen et al., 2021), balancing (Wang et al., 2022), and catapult (Lewkowycz et al., 2020). These phenomena cannot be well explained by classical optimization theory. Though significant theoretical progress has been made in understanding these implicit biases, it remains unclear for which objective functions would they occur. This paper provides an initial step in answering this question, namely that these implicit biases are in fact various tips of the same iceberg. They occur when the objective function of optimization has some good regularity, which, in combination with a provable preference of large learning rate gradient descent for moving toward flatter regions, results in these nontrivial dynamical phenomena. To establish this result, we develop a new global convergence theory under large learning rates, for a family of nonconvex functions without globally Lipschitz continuous gradient, which was typically assumed in existing convergence analysis. A byproduct is the first non-asymptotic convergence rate bound for large-learning-rate gradient descent optimization of nonconvex functions. We also validate our theory with experiments on neural networks, where different losses, activation functions, and batch normalization all can significantly affect regularity and lead to very different training dynamics.
    Explainable Spatio-Temporal Graph Neural Networks. (arXiv:2310.17149v1 [cs.LG])
    Spatio-temporal graph neural networks (STGNNs) have gained popularity as a powerful tool for effectively modeling spatio-temporal dependencies in diverse real-world urban applications, including intelligent transportation and public safety. However, the black-box nature of STGNNs limits their interpretability, hindering their application in scenarios related to urban resource allocation and policy formulation. To bridge this gap, we propose an Explainable Spatio-Temporal Graph Neural Networks (STExplainer) framework that enhances STGNNs with inherent explainability, enabling them to provide accurate predictions and faithful explanations simultaneously. Our framework integrates a unified spatio-temporal graph attention network with a positional information fusion layer as the STG encoder and decoder, respectively. Furthermore, we propose a structure distillation approach based on the Graph Information Bottleneck (GIB) principle with an explainable objective, which is instantiated by the STG encoder and decoder. Through extensive experiments, we demonstrate that our STExplainer outperforms state-of-the-art baselines in terms of predictive accuracy and explainability metrics (i.e., sparsity and fidelity) on traffic and crime prediction tasks. Furthermore, our model exhibits superior representation ability in alleviating data missing and sparsity issues. The implementation code is available at: https://github.com/HKUDS/STExplainer.
    math-PVS: A Large Language Model Framework to Map Scientific Publications to PVS Theories. (arXiv:2310.17064v1 [cs.AI])
    As artificial intelligence (AI) gains greater adoption in a wide variety of applications, it has immense potential to contribute to mathematical discovery, by guiding conjecture generation, constructing counterexamples, assisting in formalizing mathematics, and discovering connections between different mathematical areas, to name a few. While prior work has leveraged computers for exhaustive mathematical proof search, recent efforts based on large language models (LLMs) aspire to position computing platforms as co-contributors in the mathematical research process. Despite their current limitations in logic and mathematical tasks, there is growing interest in melding theorem proving systems with foundation models. This work investigates the applicability of LLMs in formalizing advanced mathematical concepts and proposes a framework that can critically review and check mathematical reasoning in research papers. Given the noted reasoning shortcomings of LLMs, our approach synergizes the capabilities of proof assistants, specifically PVS, with LLMs, enabling a bridge between textual descriptions in academic papers and formal specifications in PVS. By harnessing the PVS environment, coupled with data ingestion and conversion mechanisms, we envision an automated process, called \emph{math-PVS}, to extract and formalize mathematical theorems from research papers, offering an innovative tool for academic review and discovery.
    LLM4DyG: Can Large Language Models Solve Problems on Dynamic Graphs?. (arXiv:2310.17110v1 [cs.LG])
    In an era marked by the increasing adoption of Large Language Models (LLMs) for various tasks, there is a growing focus on exploring LLMs' capabilities in handling web data, particularly graph data. Dynamic graphs, which capture temporal network evolution patterns, are ubiquitous in real-world web data. Evaluating LLMs' competence in understanding spatial-temporal information on dynamic graphs is essential for their adoption in web applications, which remains unexplored in the literature. In this paper, we bridge the gap via proposing to evaluate LLMs' spatial-temporal understanding abilities on dynamic graphs, to the best of our knowledge, for the first time. Specifically, we propose the LLM4DyG benchmark, which includes nine specially designed tasks considering the capability evaluation of LLMs from both temporal and spatial dimensions. Then, we conduct extensive experiments to analyze the impacts of different data generators, data statistics, prompting techniques, and LLMs on the model performance. Finally, we propose Disentangled Spatial-Temporal Thoughts (DST2) for LLMs on dynamic graphs to enhance LLMs' spatial-temporal understanding abilities. Our main observations are: 1) LLMs have preliminary spatial-temporal understanding abilities on dynamic graphs, 2) Dynamic graph tasks show increasing difficulties for LLMs as the graph size and density increase, while not sensitive to the time span and data generation mechanism, 3) the proposed DST2 prompting method can help to improve LLMs' spatial-temporal understanding abilities on dynamic graphs for most tasks. The data and codes will be open-sourced at publication time.
    Hierarchical Semi-Implicit Variational Inference with Application to Diffusion Model Acceleration. (arXiv:2310.17153v1 [cs.LG])
    Semi-implicit variational inference (SIVI) has been introduced to expand the analytical variational families by defining expressive semi-implicit distributions in a hierarchical manner. However, the single-layer architecture commonly used in current SIVI methods can be insufficient when the target posterior has complicated structures. In this paper, we propose hierarchical semi-implicit variational inference, called HSIVI, which generalizes SIVI to allow more expressive multi-layer construction of semi-implicit distributions. By introducing auxiliary distributions that interpolate between a simple base distribution and the target distribution, the conditional layers can be trained by progressively matching these auxiliary distributions one layer after another. Moreover, given pre-trained score networks, HSIVI can be used to accelerate the sampling process of diffusion models with the score matching objective. We show that HSIVI significantly enhances the expressiveness of SIVI on several Bayesian inference problems with complicated target distributions. When used for diffusion model acceleration, we show that HSIVI can produce high quality samples comparable to or better than the existing fast diffusion model based samplers with a small number of function evaluations on various datasets.
    Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time. (arXiv:2310.17157v1 [cs.LG])
    Large language models (LLMs) with hundreds of billions of parameters have sparked a new wave of exciting AI applications. However, they are computationally expensive at inference time. Sparsity is a natural approach to reduce this cost, but existing methods either require costly retraining, have to forgo LLM's in-context learning ability, or do not yield wall-clock time speedup on modern hardware. We hypothesize that contextual sparsity, which are small, input-dependent sets of attention heads and MLP parameters that yield approximately the same output as the dense model for a given input, can address these issues. We show that contextual sparsity exists, that it can be accurately predicted, and that we can exploit it to speed up LLM inference in wall-clock time without compromising LLM's quality or in-context learning ability. Based on these insights, we propose DejaVu, a system that uses a low-cost algorithm to predict contextual sparsity on the fly given inputs to each layer, along with an asynchronous and hardware-aware implementation that speeds up LLM inference. We validate that DejaVu can reduce the inference latency of OPT-175B by over 2X compared to the state-of-the-art FasterTransformer, and over 6X compared to the widely used Hugging Face implementation, without compromising model quality. The code is available at https://github.com/FMInference/DejaVu.
    Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity. (arXiv:2310.17247v1 [cs.LG])
    In some settings neural networks exhibit a phenomenon known as grokking, where they achieve perfect or near-perfect accuracy on the validation set long after the same performance has been achieved on the training set. In this paper, we discover that grokking is not limited to neural networks but occurs in other settings such as Gaussian process (GP) classification, GP regression and linear regression. We also uncover a mechanism by which to induce grokking on algorithmic datasets via the addition of dimensions containing spurious information. The presence of the phenomenon in non-neural architectures provides evidence that grokking is not specific to SGD or weight norm regularisation. Instead, grokking may be possible in any setting where solution search is guided by complexity and error. Based on this insight and further trends we see in the training trajectories of a Bayesian neural network (BNN) and GP regression model, we make progress towards a more general theory of grokking. Specifically, we hypothesise that the phenomenon is governed by the accessibility of certain regions in the error and complexity landscapes.
    Bayesian Neural Networks for Geothermal Resource Assessment: Prediction with Uncertainty. (arXiv:2209.15543v3 [physics.geo-ph] UPDATED)
    We consider the application of machine learning to the evaluation of geothermal resource potential. A supervised learning problem is defined where maps of 10 geological and geophysical features within the state of Nevada, USA are used to define geothermal potential across a broad region. We have available a relatively small set of positive training sites (known resources or active power plants) and negative training sites (known drill sites with unsuitable geothermal conditions) and use these to constrain and optimize artificial neural networks for this classification task. The main objective is to predict the geothermal resource potential at unknown sites within a large geographic area where the defining features are known. These predictions could be used to target promising areas for further detailed investigations. We describe the evolution of our work from defining a specific neural network architecture to training and optimization trials. Upon analysis we expose the inevitable problems of model variability and resulting prediction uncertainty. Finally, to address these problems we apply the concept of Bayesian neural networks, a heuristic approach to regularization in network training, and make use of the practical interpretation of the formal uncertainty measures they provide.
    Topic Segmentation of Semi-Structured and Unstructured Conversational Datasets using Language Models. (arXiv:2310.17120v1 [cs.CL])
    Breaking down a document or a conversation into multiple contiguous segments based on its semantic structure is an important and challenging problem in NLP, which can assist many downstream tasks. However, current works on topic segmentation often focus on segmentation of structured texts. In this paper, we comprehensively analyze the generalization capabilities of state-of-the-art topic segmentation models on unstructured texts. We find that: (a) Current strategies of pre-training on a large corpus of structured text such as Wiki-727K do not help in transferability to unstructured conversational data. (b) Training from scratch with only a relatively small-sized dataset of the target unstructured domain improves the segmentation results by a significant margin. We stress-test our proposed Topic Segmentation approach by experimenting with multiple loss functions, in order to mitigate effects of imbalance in unstructured conversational datasets. Our empirical evaluation indicates that Focal Loss function is a robust alternative to Cross-Entropy and re-weighted Cross-Entropy loss function when segmenting unstructured and semi-structured chats.
    Codebook Features: Sparse and Discrete Interpretability for Neural Networks. (arXiv:2310.17230v1 [cs.LG])
    Understanding neural networks is challenging in part because of the dense, continuous nature of their hidden states. We explore whether we can train neural networks to have hidden states that are sparse, discrete, and more interpretable by quantizing their continuous features into what we call codebook features. Codebook features are produced by finetuning neural networks with vector quantization bottlenecks at each layer, producing a network whose hidden features are the sum of a small number of discrete vector codes chosen from a larger codebook. Surprisingly, we find that neural networks can operate under this extreme bottleneck with only modest degradation in performance. This sparse, discrete bottleneck also provides an intuitive way of controlling neural network behavior: first, find codes that activate when the desired behavior is present, then activate those same codes during generation to elicit that behavior. We validate our approach by training codebook Transformers on several different datasets. First, we explore a finite state machine dataset with far more hidden states than neurons. In this setting, our approach overcomes the superposition problem by assigning states to distinct codes, and we find that we can make the neural network behave as if it is in a different state by activating the code for that state. Second, we train Transformer language models with up to 410M parameters on two natural language datasets. We identify codes in these models representing diverse, disentangled concepts (ranging from negative emotions to months of the year) and find that we can guide the model to generate different topics by activating the appropriate codes during inference. Overall, codebook features appear to be a promising unit of analysis and control for neural networks and interpretability. Our codebase and models are open-sourced at https://github.com/taufeeque9/codebook-features.
    MaxEnt Loss: Constrained Maximum Entropy for Calibration under Out-of-Distribution Shift. (arXiv:2310.17159v1 [cs.LG])
    We present a new loss function that addresses the out-of-distribution (OOD) calibration problem. While many objective functions have been proposed to effectively calibrate models in-distribution, our findings show that they do not always fare well OOD. Based on the Principle of Maximum Entropy, we incorporate helpful statistical constraints observed during training, delivering better model calibration without sacrificing accuracy. We provide theoretical analysis and show empirically that our method works well in practice, achieving state-of-the-art calibration on both synthetic and real-world benchmarks.
    Improving Neural Additive Models with Bayesian Principles. (arXiv:2305.16905v2 [stat.ML] UPDATED)
    Neural additive models (NAMs) can improve the interpretability of deep neural networks by handling input features in separate additive sub-networks. However, they lack inherent mechanisms that provide calibrated uncertainties and enable selection of relevant features and interactions. Approaching NAMs from a Bayesian perspective, we enhance them in three primary ways, namely by a) providing credible intervals for the individual additive sub-networks; b) estimating the marginal likelihood to perform an implicit selection of features via an empirical Bayes procedure; and c) enabling a ranking of feature pairs as candidates for second-order interaction in fine-tuned models. In particular, we develop Laplace-approximated NAMs (LA-NAMs), which show improved empirical performance on tabular datasets and challenging real-world medical tasks.
    Language-based Action Concept Spaces Improve Video Self-Supervised Learning. (arXiv:2307.10922v3 [cs.CV] UPDATED)
    Recent contrastive language image pre-training has led to learning highly transferable and robust image representations. However, adapting these models to video domains with minimal supervision remains an open problem. We explore a simple step in that direction, using language tied self-supervised learning to adapt an image CLIP model to the video domain. A backbone modified for temporal modeling is trained under self-distillation settings with train objectives operating in an action concept space. Feature vectors of various action concepts extracted from a language encoder using relevant textual prompts construct this space. We introduce two train objectives, concept distillation and concept alignment, that retain generality of original representations while enforcing relations between actions and their attributes. Our approach improves zero-shot and linear probing performance on three action recognition benchmarks.
    Seeing is not Believing: Robust Reinforcement Learning against Spurious Correlation. (arXiv:2307.07907v2 [cs.LG] UPDATED)
    Robustness has been extensively studied in reinforcement learning (RL) to handle various forms of uncertainty such as random perturbations, rare events, and malicious attacks. In this work, we consider one critical type of robustness against spurious correlation, where different portions of the state do not have correlations induced by unobserved confounders. These spurious correlations are ubiquitous in real-world tasks, for instance, a self-driving car usually observes heavy traffic in the daytime and light traffic at night due to unobservable human activity. A model that learns such useless or even harmful correlation could catastrophically fail when the confounder in the test case deviates from the training one. Although motivated, enabling robustness against spurious correlation poses significant challenges since the uncertainty set, shaped by the unobserved confounder and causal structure, is difficult to characterize and identify. Existing robust algorithms that assume simple and unstructured uncertainty sets are therefore inadequate to address this challenge. To solve this issue, we propose Robust State-Confounded Markov Decision Processes (RSC-MDPs) and theoretically demonstrate its superiority in avoiding learning spurious correlations compared with other robust RL counterparts. We also design an empirical algorithm to learn the robust optimal policy for RSC-MDPs, which outperforms all baselines in eight realistic self-driving and manipulation tasks.
    MathVista: Evaluating Math Reasoning in Visual Contexts with GPT-4V, Bard, and Other Large Multimodal Models. (arXiv:2310.02255v2 [cs.CV] UPDATED)
    Large Language Models (LLMs) and Large Multimodal Models (LMMs) exhibit impressive problem-solving skills in many tasks and domains, but their ability in mathematical reasoning in visual contexts has not been systematically studied. To bridge this gap, we present MathVista, a benchmark designed to combine challenges from diverse mathematical and visual tasks. It consists of 6,141 examples, derived from 28 existing multimodal datasets involving mathematics and 3 newly created datasets (i.e., IQTest, FunctionQA, and PaperQA). Completing these tasks requires fine-grained, deep visual understanding and compositional reasoning, which all state-of-the-art foundation models find challenging. With MathVista, we have conducted a comprehensive, quantitative evaluation of 12 prominent foundation models. The best-performing GPT-4V model achieves an overall accuracy of 49.9%, substantially outperforming Bard, the second-best performer, by 15.1%. Our in-depth analysis reveals that the superiority of GPT-4V is mainly attributed to its enhanced visual perception and mathematical reasoning. However, GPT-4V still falls short of human performance by 10.4%, as it often struggles to understand complex figures and perform rigorous reasoning. This significant gap underscores the critical role that MathVista will play in the development of general-purpose AI agents capable of tackling mathematically intensive and visually rich real-world tasks. We further explore the new ability of self-verification, the application of self-consistency, and the interactive chatbot capabilities of GPT-4V, highlighting its promising potential for future research. The project is available at https://mathvista.github.io/.
    Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL. (arXiv:2306.04220v4 [cs.LG] UPDATED)
    Offline reinforcement learning (RL) offers an appealing approach to real-world tasks by learning policies from pre-collected datasets without interacting with the environment. However, the performance of existing offline RL algorithms heavily depends on the scale and state-action space coverage of datasets. Real-world data collection is often expensive and uncontrollable, leading to small and narrowly covered datasets and posing significant challenges for practical deployments of offline RL. In this paper, we provide a new insight that leveraging the fundamental symmetry of system dynamics can substantially enhance offline RL performance under small datasets. Specifically, we propose a Time-reversal symmetry (T-symmetry) enforced Dynamics Model (TDM), which establishes consistency between a pair of forward and reverse latent dynamics. TDM provides both well-behaved representations for small datasets and a new reliability measure for OOD samples based on compliance with the T-symmetry. These can be readily used to construct a new offline RL algorithm (TSRL) with less conservative policy constraints and a reliable latent space data augmentation procedure. Based on extensive experiments, we find TSRL achieves great performance on small benchmark datasets with as few as 1% of the original samples, which significantly outperforms the recent offline RL algorithms in terms of data efficiency and generalizability.Code is available at: https://github.com/pcheng2/TSRL
    A framework for benchmarking clustering algorithms. (arXiv:2209.09493v3 [cs.LG] UPDATED)
    The evaluation of clustering algorithms can involve running them on a variety of benchmark problems, and comparing their outputs to the reference, ground-truth groupings provided by experts. Unfortunately, many research papers and graduate theses consider only a small number of datasets. Also, the fact that there can be many equally valid ways to cluster a given problem set is rarely taken into account. In order to overcome these limitations, we have developed a framework whose aim is to introduce a consistent methodology for testing clustering algorithms. Furthermore, we have aggregated, polished, and standardised many clustering benchmark dataset collections referred to across the machine learning and data mining literature, and included new datasets of different dimensionalities, sizes, and cluster types. An interactive datasets explorer, the documentation of the Python API, a description of the ways to interact with the framework from other programming languages such as R or MATLAB, and other details are all provided at .
    COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers. (arXiv:2309.01270v2 [cs.CV] UPDATED)
    We present COMEDIAN, a novel pipeline to initialize spatiotemporal transformers for action spotting, which involves self-supervised learning and knowledge distillation. Action spotting is a timestamp-level temporal action detection task. Our pipeline consists of three steps, with two initialization stages. First, we perform self-supervised initialization of a spatial transformer using short videos as input. Additionally, we initialize a temporal transformer that enhances the spatial transformer's outputs with global context through knowledge distillation from a pre-computed feature bank aligned with each short video segment. In the final step, we fine-tune the transformers to the action spotting task. The experiments, conducted on the SoccerNet-v2 dataset, demonstrate state-of-the-art performance and validate the effectiveness of COMEDIAN's pretraining paradigm. Our results highlight several advantages of our pretraining pipeline, including improved performance and faster convergence compared to non-pretrained models.
    Evaluating Bias and Fairness in Gender-Neutral Pretrained Vision-and-Language Models. (arXiv:2310.17530v1 [cs.CV])
    Pretrained machine learning models are known to perpetuate and even amplify existing biases in data, which can result in unfair outcomes that ultimately impact user experience. Therefore, it is crucial to understand the mechanisms behind those prejudicial biases to ensure that model performance does not result in discriminatory behaviour toward certain groups or populations. In this work, we define gender bias as our case study. We quantify bias amplification in pretraining and after fine-tuning on three families of vision-and-language models. We investigate the connection, if any, between the two learning stages, and evaluate how bias amplification reflects on model performance. Overall, we find that bias amplification in pretraining and after fine-tuning are independent. We then examine the effect of continued pretraining on gender-neutral data, finding that this reduces group disparities, i.e., promotes fairness, on VQAv2 and retrieval tasks without significantly compromising task performance.
    Bifurcations and loss jumps in RNN training. (arXiv:2310.17561v1 [cs.LG])
    Recurrent neural networks (RNNs) are popular machine learning tools for modeling and forecasting sequential data and for inferring dynamical systems (DS) from observed time series. Concepts from DS theory (DST) have variously been used to further our understanding of both, how trained RNNs solve complex tasks, and the training process itself. Bifurcations are particularly important phenomena in DS, including RNNs, that refer to topological (qualitative) changes in a system's dynamical behavior as one or more of its parameters are varied. Knowing the bifurcation structure of an RNN will thus allow to deduce many of its computational and dynamical properties, like its sensitivity to parameter variations or its behavior during training. In particular, bifurcations may account for sudden loss jumps observed in RNN training that could severely impede the training process. Here we first mathematically prove for a particular class of ReLU-based RNNs that certain bifurcations are indeed associated with loss gradients tending toward infinity or zero. We then introduce a novel heuristic algorithm for detecting all fixed points and k-cycles in ReLU-based RNNs and their existence and stability regions, hence bifurcation manifolds in parameter space. In contrast to previous numerical algorithms for finding fixed points and common continuation methods, our algorithm provides exact results and returns fixed points and cycles up to high orders with surprisingly good scaling behavior. We exemplify the algorithm on the analysis of the training process of RNNs, and find that the recently introduced technique of generalized teacher forcing completely avoids certain types of bifurcations in training. Thus, besides facilitating the DST analysis of trained RNNs, our algorithm provides a powerful instrument for analyzing the training process itself.
    Time-Conditioned Generative Modeling of Object-Centric Representations for Video Decomposition and Prediction. (arXiv:2301.08951v4 [cs.CV] UPDATED)
    When perceiving the world from multiple viewpoints, humans have the ability to reason about the complete objects in a compositional manner even when an object is completely occluded from certain viewpoints. Meanwhile, humans are able to imagine novel views after observing multiple viewpoints. Recent remarkable advances in multi-view object-centric learning still leaves some unresolved problems: 1) The shapes of partially or completely occluded objects can not be well reconstructed. 2) The novel viewpoint prediction depends on expensive viewpoint annotations rather than implicit rules in view representations. In this paper, we introduce a time-conditioned generative model for videos. To reconstruct the complete shape of an object accurately, we enhance the disentanglement between the latent representations of objects and views, where the latent representations of time-conditioned views are jointly inferred with a Transformer and then are input to a sequential extension of Slot Attention to learn object-centric representations. In addition, Gaussian processes are employed as priors of view latent variables for video generation and novel-view prediction without viewpoint annotations. Experiments on multiple datasets demonstrate that the proposed model can make object-centric video decomposition, reconstruct the complete shapes of occluded objects, and make novel-view predictions.
    Bounding Box-based Multi-objective Bayesian Optimization of Risk Measures under Input Uncertainty. (arXiv:2301.11588v2 [stat.ML] UPDATED)
    In this study, we propose a novel multi-objective Bayesian optimization (MOBO) method to efficiently identify the Pareto front (PF) defined by risk measures for black-box functions under the presence of input uncertainty (IU). Existing BO methods for Pareto optimization in the presence of IU are risk-specific or without theoretical guarantees, whereas our proposed method addresses general risk measures and has theoretical guarantees. The basic idea of the proposed method is to assume a Gaussian process (GP) model for the black-box function and to construct high-probability bounding boxes for the risk measures using the GP model. Furthermore, in order to reduce the uncertainty of non-dominated bounding boxes, we propose a method of selecting the next evaluation point using a maximin distance defined by the maximum value of a quasi distance based on bounding boxes. As theoretical analysis, we prove that the algorithm can return an arbitrary-accurate solution in a finite number of iterations with high probability, for various risk measures such as Bayes risk, worst-case risk, and value-at-risk. We also give a theoretical analysis that takes into account approximation errors because there exist non-negligible approximation errors (e.g., finite approximation of PFs and sampling-based approximation of bounding boxes) in practice. We confirm that the proposed method outperforms compared with existing methods not only in the setting with IU but also in the setting of ordinary MOBO through numerical experiments.
    Trust, but Verify: Robust Image Segmentation using Deep Learning. (arXiv:2310.16999v1 [cs.CV])
    We describe a method for verifying the output of a deep neural network for medical image segmentation that is robust to several classes of random as well as worst-case perturbations i.e. adversarial attacks. This method is based on a general approach recently developed by the authors called ``Trust, but Verify" wherein an auxiliary verification network produces predictions about certain masked features in the input image using the segmentation as an input. A well-designed auxiliary network will produce high-quality predictions when the input segmentations are accurate, but will produce low-quality predictions when the segmentations are incorrect. Checking the predictions of such a network with the original image allows us to detect bad segmentations. However, to ensure the verification method is truly robust, we need a method for checking the quality of the predictions that does not itself rely on a black-box neural network. Indeed, we show that previous methods for segmentation evaluation that do use deep neural regression networks are vulnerable to false negatives i.e. can inaccurately label bad segmentations as good. We describe the design of a verification network that avoids such vulnerability and present results to demonstrate its robustness compared to previous methods.
    The statistical thermodynamics of generative diffusion models. (arXiv:2310.17467v1 [stat.ML])
    Generative diffusion models have achieved spectacular performance in many areas of generative modeling. While the fundamental ideas behind these models come from non-equilibrium physics, in this paper we show that many aspects of these models can be understood using the tools of equilibrium statistical mechanics. Using this reformulation, we show that generative diffusion models undergo second-order phase transitions corresponding to symmetry breaking phenomena. We argue that this lead to a form of instability that lies at the heart of their generative capabilities and that can be described by a set of mean field critical exponents. We conclude by analyzing recent work connecting diffusion models and associative memory networks in view of the thermodynamic formulations.
    Bias in Evaluation Processes: An Optimization-Based Model. (arXiv:2310.17489v1 [cs.CY])
    Biases with respect to socially-salient attributes of individuals have been well documented in evaluation processes used in settings such as admissions and hiring. We view such an evaluation process as a transformation of a distribution of the true utility of an individual for a task to an observed distribution and model it as a solution to a loss minimization problem subject to an information constraint. Our model has two parameters that have been identified as factors leading to biases: the resource-information trade-off parameter in the information constraint and the risk-averseness parameter in the loss function. We characterize the distributions that arise from our model and study the effect of the parameters on the observed distribution. The outputs of our model enrich the class of distributions that can be used to capture variation across groups in the observed evaluations. We empirically validate our model by fitting real-world datasets and use it to study the effect of interventions in a downstream selection task. These results contribute to an understanding of the emergence of bias in evaluation processes and provide tools to guide the deployment of interventions to mitigate biases.
    Transferring a molecular foundation model for polymer property predictions. (arXiv:2310.16958v1 [cs.LG])
    Transformer-based large language models have remarkable potential to accelerate design optimization for applications such as drug development and materials discovery. Self-supervised pretraining of transformer models requires large-scale datasets, which are often sparsely populated in topical areas such as polymer science. State-of-the-art approaches for polymers conduct data augmentation to generate additional samples but unavoidably incurs extra computational costs. In contrast, large-scale open-source datasets are available for small molecules and provide a potential solution to data scarcity through transfer learning. In this work, we show that using transformers pretrained on small molecules and fine-tuned on polymer properties achieve comparable accuracy to those trained on augmented polymer datasets for a series of benchmark prediction tasks.
    BOOST: Harnessing Black-Box Control to Boost Commonsense in LMs' Generation. (arXiv:2310.17054v1 [cs.CL])
    Large language models (LLMs) such as GPT-3 have demonstrated a strong capability to generate coherent and contextually relevant text. However, amidst their successes, a crucial issue persists: their generated outputs still lack commonsense at times. Moreover, fine-tuning the entire LLM towards more commonsensical outputs is computationally expensive if not infeasible. In this paper, we present a computation-efficient framework that steers a frozen Pre-Trained Language Model (PTLM) towards more commonsensical generation (i.e., producing a plausible output that incorporates a list of concepts in a meaningful way). Specifically, we first construct a reference-free evaluator that assigns a sentence with a commonsensical score by grounding the sentence to a dynamic commonsense knowledge base from four different relational aspects. We then use the scorer as the oracle for commonsense knowledge, and extend the controllable generation method called NADO to train an auxiliary head that guides a fixed PTLM to better satisfy the oracle. We test our framework on a series of GPT-2-, Flan-T5-, and Alpaca-based language models (LMs) on two constrained concept-to-sentence benchmarks. Human evaluation results demonstrate that our method consistently leads to the most commonsensical outputs.
    On the Convergence of CART under Sufficient Impurity Decrease Condition. (arXiv:2310.17114v1 [stat.ML])
    The decision tree is a flexible machine learning model that finds its success in numerous applications. It is usually fitted in a recursively greedy manner using CART. In this paper, we investigate the convergence rate of CART under a regression setting. First, we establish an upper bound on the prediction error of CART under a sufficient impurity decrease (SID) condition \cite{chi2022asymptotic} -- our result improves upon the known result by \cite{chi2022asymptotic} under a similar assumption. Furthermore, we provide examples that demonstrate the error bound cannot be further improved by more than a constant or a logarithmic factor. Second, we introduce a set of easily verifiable sufficient conditions for the SID condition. Specifically, we demonstrate that the SID condition can be satisfied in the case of an additive model, provided that the component functions adhere to a ``locally reverse Poincar{\'e} inequality". We discuss several well-known function classes in non-parametric estimation to illustrate the practical utility of this concept.
    Harnessing the Power of Choices in Decision Tree Learning. (arXiv:2310.01551v2 [cs.LG] UPDATED)
    We propose a simple generalization of standard and empirically successful decision tree learning algorithms such as ID3, C4.5, and CART. These algorithms, which have been central to machine learning for decades, are greedy in nature: they grow a decision tree by iteratively splitting on the best attribute. Our algorithm, Top-$k$, considers the $k$ best attributes as possible splits instead of just the single best attribute. We demonstrate, theoretically and empirically, the power of this simple generalization. We first prove a {\sl greediness hierarchy theorem} showing that for every $k \in \mathbb{N}$, Top-$(k+1)$ can be dramatically more powerful than Top-$k$: there are data distributions for which the former achieves accuracy $1-\varepsilon$, whereas the latter only achieves accuracy $\frac1{2}+\varepsilon$. We then show, through extensive experiments, that Top-$k$ outperforms the two main approaches to decision tree learning: classic greedy algorithms and more recent "optimal decision tree" algorithms. On one hand, Top-$k$ consistently enjoys significant accuracy gains over greedy algorithms across a wide range of benchmarks. On the other hand, Top-$k$ is markedly more scalable than optimal decision tree algorithms and is able to handle dataset and feature set sizes that remain far beyond the reach of these algorithms.
    Privately Aligning Language Models with Reinforcement Learning. (arXiv:2310.16960v1 [cs.LG])
    Positioned between pre-training and user deployment, aligning large language models (LLMs) through reinforcement learning (RL) has emerged as a prevailing strategy for training instruction following-models such as ChatGPT. In this work, we initiate the study of privacy-preserving alignment of LLMs through Differential Privacy (DP) in conjunction with RL. Following the influential work of Ziegler et al. (2020), we study two dominant paradigms: (i) alignment via RL without human in the loop (e.g., positive review generation) and (ii) alignment via RL from human feedback (RLHF) (e.g., summarization in a human-preferred way). We give a new DP framework to achieve alignment via RL, and prove its correctness. Our experimental results validate the effectiveness of our approach, offering competitive utility while ensuring strong privacy protections.
    Learning to Rank for Active Learning via Multi-Task Bilevel Optimization. (arXiv:2310.17044v1 [cs.LG])
    Active learning is a promising paradigm to reduce the labeling cost by strategically requesting labels to improve model performance. However, existing active learning methods often rely on expensive acquisition function to compute, extensive modeling retraining and multiple rounds of interaction with annotators. To address these limitations, we propose a novel approach for active learning, which aims to select batches of unlabeled instances through a learned surrogate model for data acquisition. A key challenge in this approach is developing an acquisition function that generalizes well, as the history of data, which forms part of the utility function's input, grows over time. Our novel algorithmic contribution is a bilevel multi-task bilevel optimization framework that predicts the relative utility -- measured by the validation accuracy -- of different training sets, and ensures the learned acquisition function generalizes effectively. For cases where validation accuracy is expensive to evaluate, we introduce efficient interpolation-based surrogate models to estimate the utility function, reducing the evaluation cost. We demonstrate the performance of our approach through extensive experiments on standard active classification benchmarks. By employing our learned utility function, we show significant improvements over traditional techniques, paving the way for more efficient and effective utility maximization in active learning applications.
    Strategizing EV Charging and Renewable Integration in Texas. (arXiv:2310.17056v1 [eess.SY])
    Exploring the convergence of electric vehicles (EVs), renewable energy, and smart grid technologies in the context of Texas, this study addresses challenges hindering the widespread adoption of EVs. Acknowledging their environmental benefits, the research focuses on grid stability concerns, uncoordinated charging patterns, and the complicated relationship between EVs and renewable energy sources. Dynamic time warping (DTW) clustering and k-means clustering methodologies categorize days based on total load and net load, offering nuanced insights into daily electricity consumption and renewable energy generation patterns. By establishing optimal charging and vehicle-to-grid (V2G) windows tailored to specific load characteristics, the study provides a sophisticated methodology for strategic decision-making in energy consumption and renewable integration. The findings contribute to the ongoing discourse on achieving a sustainable and resilient energy future through the seamless integration of EVs into smart grids.
    Isometric Motion Manifold Primitives. (arXiv:2310.17072v1 [cs.AI])
    The Motion Manifold Primitive (MMP) produces, for a given task, a continuous manifold of trajectories each of which can successfully complete the task. It consists of the decoder function that parametrizes the manifold and the probability density in the latent coordinate space. In this paper, we first show that the MMP performance can significantly degrade due to the geometric distortion in the latent space -- by distortion, we mean that similar motions are not located nearby in the latent space. We then propose {\it Isometric Motion Manifold Primitives (IMMP)} whose latent coordinate space preserves the geometry of the manifold. For this purpose, we formulate and use a Riemannian metric for the motion space (i.e., parametric curve space), which we call a {\it CurveGeom Riemannian metric}. Experiments with planar obstacle-avoiding motions and pushing manipulation tasks show that IMMP significantly outperforms existing MMP methods. Code is available at https://github.com/Gabe-YHLee/IMMP-public.
    Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models. (arXiv:2310.17086v1 [cs.LG])
    Transformers are remarkably good at in-context learning (ICL) -- learning from demonstrations without parameter updates -- but how they perform ICL remains a mystery. Recent work suggests that Transformers may learn in-context by internally running Gradient Descent, a first-order optimization method. In this paper, we instead demonstrate that Transformers learn to implement higher-order optimization methods to perform ICL. Focusing on in-context linear regression, we show that Transformers learn to implement an algorithm very similar to Iterative Newton's Method, a higher-order optimization method, rather than Gradient Descent. Empirically, we show that predictions from successive Transformer layers closely match different iterations of Newton's Method linearly, with each middle layer roughly computing 3 iterations. In contrast, exponentially more Gradient Descent steps are needed to match an additional Transformers layer; this suggests that Transformers have an comparable rate of convergence with high-order methods such as Iterative Newton, which are exponentially faster than Gradient Descent. We also show that Transformers can learn in-context on ill-conditioned data, a setting where Gradient Descent struggles but Iterative Newton succeeds. Finally, we show theoretical results which support our empirical findings and have a close correspondence with them: we prove that Transformers can implement $k$ iterations of Newton's method with $\mathcal{O}(k)$ layers.
    Efficient Neural Network Approaches for Conditional Optimal Transport with Applications in Bayesian Inference. (arXiv:2310.16975v1 [stat.ML])
    We present two neural network approaches that approximate the solutions of static and dynamic conditional optimal transport (COT) problems, respectively. Both approaches enable sampling and density estimation of conditional probability distributions, which are core tasks in Bayesian inference. Our methods represent the target conditional distributions as transformations of a tractable reference distribution and, therefore, fall into the framework of measure transport. COT maps are a canonical choice within this framework, with desirable properties such as uniqueness and monotonicity. However, the associated COT problems are computationally challenging, even in moderate dimensions. To improve the scalability, our numerical algorithms leverage neural networks to parameterize COT maps. Our methods exploit the structure of the static and dynamic formulations of the COT problem. PCP-Map models conditional transport maps as the gradient of a partially input convex neural network (PICNN) and uses a novel numerical implementation to increase computational efficiency compared to state-of-the-art alternatives. COT-Flow models conditional transports via the flow of a regularized neural ODE; it is slower to train but offers faster sampling. We demonstrate their effectiveness and efficiency by comparing them with state-of-the-art approaches using benchmark datasets and Bayesian inverse problems.
    Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A Comprehensive Benchmark. (arXiv:2310.16981v1 [cs.LG])
    Synthetic data serves as an alternative in training machine learning models, particularly when real-world data is limited or inaccessible. However, ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task. This paper addresses this issue by exploring the potential of integrating data-centric AI techniques which profile the data to guide the synthetic data generation process. Moreover, we shed light on the often ignored consequences of neglecting these data profiles during synthetic data generation -- despite seemingly high statistical fidelity. Subsequently, we propose a novel framework to evaluate the integration of data profiles to guide the creation of more representative synthetic data. In an empirical study, we evaluate the performance of five state-of-the-art models for tabular data generation on eleven distinct tabular datasets. The findings offer critical insights into the successes and limitations of current synthetic data generation techniques. Finally, we provide practical recommendations for integrating data-centric insights into the synthetic data generation process, with a specific focus on classification performance, model selection, and feature selection. This study aims to reevaluate conventional approaches to synthetic data generation and promote the application of data-centric AI techniques in improving the quality and effectiveness of synthetic data.
    Conditionally Combining Robot Skills using Large Language Models. (arXiv:2310.17019v1 [cs.LG])
    This paper combines two contributions. First, we introduce an extension of the Meta-World benchmark, which we call "Language-World," which allows a large language model to operate in a simulated robotic environment using semi-structured natural language queries and scripted skills described using natural language. By using the same set of tasks as Meta-World, Language-World results can be easily compared to Meta-World results, allowing for a point of comparison between recent methods using Large Language Models (LLMs) and those using Deep Reinforcement Learning. Second, we introduce a method we call Plan Conditioned Behavioral Cloning (PCBC), that allows finetuning the behavior of high-level plans using end-to-end demonstrations. Using Language-World, we show that PCBC is able to achieve strong performance in a variety of few-shot regimes, often achieving task generalization with as little as a single demonstration. We have made Language-World available as open-source software at https://github.com/krzentner/language-world/.
    Streaming Factor Trajectory Learning for Temporal Tensor Decomposition. (arXiv:2310.17021v1 [cs.LG])
    Practical tensor data is often along with time information. Most existing temporal decomposition approaches estimate a set of fixed factors for the objects in each tensor mode, and hence cannot capture the temporal evolution of the objects' representation. More important, we lack an effective approach to capture such evolution from streaming data, which is common in real-world applications. To address these issues, we propose Streaming Factor Trajectory Learning (SFTL) for temporal tensor decomposition. We use Gaussian processes (GPs) to model the trajectory of factors so as to flexibly estimate their temporal evolution. To address the computational challenges in handling streaming data, we convert the GPs into a state-space prior by constructing an equivalent stochastic differential equation (SDE). We develop an efficient online filtering algorithm to estimate a decoupled running posterior of the involved factor states upon receiving new data. The decoupled estimation enables us to conduct standard Rauch-Tung-Striebel smoothing to compute the full posterior of all the trajectories in parallel, without the need for revisiting any previous data. We have shown the advantage of SFTL in both synthetic tasks and real-world applications.
    Road Network Guided Fine-Grained Urban Traffic Flow Inference. (arXiv:2109.14251v3 [cs.LG] UPDATED)
    Accurate inference of fine-grained traffic flow from coarse-grained one is an emerging yet crucial problem, which can help greatly reduce the number of the required traffic monitoring sensors for cost savings. In this work, we notice that traffic flow has a high correlation with road network, which was either completely ignored or simply treated as an external factor in previous works. To facilitate this problem, we propose a novel Road-Aware Traffic Flow Magnifier (RATFM) that explicitly exploits the prior knowledge of road networks to fully learn the road-aware spatial distribution of fine-grained traffic flow. Specifically, a multi-directional 1D convolutional layer is first introduced to extract the semantic feature of the road network. Subsequently, we incorporate the road network feature and coarse-grained flow feature to regularize the short-range spatial distribution modeling of road-relative traffic flow. Furthermore, we take the road network feature as a query to capture the long-range spatial distribution of traffic flow with a transformer architecture. Benefiting from the road-aware inference mechanism, our method can generate high-quality fine-grained traffic flow maps. Extensive experiments on three real-world datasets show that the proposed RATFM outperforms state-of-the-art models under various scenarios. Our code and datasets are released at {\url{https://github.com/luimoli/RATFM}}.
    Label Embedding via Low-Coherence Matrices. (arXiv:2305.19470v3 [cs.LG] UPDATED)
    Label embedding is a framework for multiclass classification problems where each label is represented by a distinct vector of some fixed dimension, and training involves matching model output to the vector representing the correct label. While label embedding has been successfully applied in extreme classification and zero-shot learning, and offers both computational and statistical advantages, its theoretical foundations remain poorly understood. This work presents an analysis of label embedding in the context of extreme multiclass classification, where the number of classes $C$ is very large. We present an excess risk bound that reveals a trade-off between computational and statistical efficiency, quantified via the coherence of the embedding matrix. We further show that under the Massart noise condition, the statistical penalty for label embedding vanishes with sufficiently low coherence. Our analysis supports an algorithm that is simple, scalable, and easily parallelizable, and experimental results demonstrate its effectiveness in large-scale applications.
    Controlled Decoding from Language Models. (arXiv:2310.17022v1 [cs.LG])
    We propose controlled decoding (CD), a novel off-policy reinforcement learning method to control the autoregressive generation from language models towards high reward outcomes. CD solves an off-policy reinforcement learning problem through a value function for the reward, which we call a prefix scorer. The prefix scorer is used at inference time to steer the generation towards higher reward outcomes. We show that the prefix scorer may be trained on (possibly) off-policy data to predict the expected reward when decoding is continued from a partially decoded response. We empirically demonstrate that CD is effective as a control mechanism on Reddit conversations corpus. We also show that the modularity of the design of CD makes it possible to control for multiple rewards, effectively solving a multi-objective reinforcement learning problem with no additional complexity. Finally, we show that CD can be applied in a novel blockwise fashion at inference-time, again without the need for any training-time changes, essentially bridging the gap between the popular best-of-$K$ strategy and token-level reinforcement learning. This makes CD a promising approach for alignment of language models.
    Towards Matching Phones and Speech Representations. (arXiv:2310.17558v1 [cs.CL])
    Learning phone types from phone instances has been a long-standing problem, while still being open. In this work, we revisit this problem in the context of self-supervised learning, and pose it as the problem of matching cluster centroids to phone embeddings. We study two key properties that enable matching, namely, whether cluster centroids of self-supervised representations reduce the variability of phone instances and respect the relationship among phones. We then use the matching result to produce pseudo-labels and introduce a new loss function for improving self-supervised representations. Our experiments show that the matching result captures the relationship among phones. Training the new loss function jointly with the regular self-supervised losses, such as APC and CPC, significantly improves the downstream phone classification.
    Lithium Metal Battery Quality Control via Transformer-CNN Segmentation. (arXiv:2302.04824v2 [cs.CV] UPDATED)
    Lithium metal battery (LMB) has the potential to be the next-generation battery system because of its high theoretical energy density. However, defects known as dendrites are formed by heterogeneous lithium (Li) plating, which hinders the development and utilization of LMBs. Non-destructive techniques to observe the dendrite morphology often use X-ray computed tomography (XCT) to provide cross-sectional views. To retrieve three-dimensional structures inside a battery, image segmentation becomes essential to quantitatively analyze XCT images. This work proposes a new semantic segmentation approach using a transformer-based neural network called TransforCNN that is capable of segmenting out dendrites from XCT data. In addition, we compare the performance of the proposed TransforCNN with three other algorithms, such as U-Net, Y-Net, and E-Net, consisting of an Ensemble Network model for XCT analysis. Our results show the advantages of using TransforCNN when evaluating over-segmentation metrics, such as mean Intersection over Union (mIoU) and mean Dice Similarity Coefficient (mDSC) as well as through several qualitatively comparative visualizations.
    Statistically Valid Variable Importance Assessment through Conditional Permutations. (arXiv:2309.07593v2 [cs.LG] UPDATED)
    Variable importance assessment has become a crucial step in machine-learning applications when using complex learners, such as deep neural networks, on large-scale data. Removal-based importance assessment is currently the reference approach, particularly when statistical guarantees are sought to justify variable inclusion. It is often implemented with variable permutation schemes. On the flip side, these approaches risk misidentifying unimportant variables as important in the presence of correlations among covariates. Here we develop a systematic approach for studying Conditional Permutation Importance (CPI) that is model agnostic and computationally lean, as well as reusable benchmarks of state-of-the-art variable importance estimators. We show theoretically and empirically that $\textit{CPI}$ overcomes the limitations of standard permutation importance by providing accurate type-I error control. When used with a deep neural network, $\textit{CPI}$ consistently showed top accuracy across benchmarks. An experiment on real-world data analysis in a large-scale medical dataset showed that $\textit{CPI}$ provides a more parsimonious selection of statistically significant variables. Our results suggest that $\textit{CPI}$ can be readily used as drop-in replacement for permutation-based methods.
    Looping in the Human: Collaborative and Explainable Bayesian Optimization. (arXiv:2310.17273v1 [cs.LG])
    Like many optimizers, Bayesian optimization often falls short of gaining user trust due to opacity. While attempts have been made to develop human-centric optimizers, they typically assume user knowledge is well-specified and error-free, employing users mainly as supervisors of the optimization process. We relax these assumptions and propose a more balanced human-AI partnership with our Collaborative and Explainable Bayesian Optimization (CoExBO) framework. Instead of explicitly requiring a user to provide a knowledge model, CoExBO employs preference learning to seamlessly integrate human insights into the optimization, resulting in algorithmic suggestions that resonate with user preference. CoExBO explains its candidate selection every iteration to foster trust, empowering users with a clearer grasp of the optimization. Furthermore, CoExBO offers a no-harm guarantee, allowing users to make mistakes; even with extreme adversarial interventions, the algorithm converges asymptotically to a vanilla Bayesian optimization. We validate CoExBO's efficacy through human-AI teaming experiments in lithium-ion battery design, highlighting substantial improvements over conventional methods.
    Neuro-Inspired Fragmentation and Recall to Overcome Catastrophic Forgetting in Curiosity. (arXiv:2310.17537v1 [cs.AI])
    Deep reinforcement learning methods exhibit impressive performance on a range of tasks but still struggle on hard exploration tasks in large environments with sparse rewards. To address this, intrinsic rewards can be generated using forward model prediction errors that decrease as the environment becomes known, and incentivize an agent to explore novel states. While prediction-based intrinsic rewards can help agents solve hard exploration tasks, they can suffer from catastrophic forgetting and actually increase at visited states. We first examine the conditions and causes of catastrophic forgetting in grid world environments. We then propose a new method FARCuriosity, inspired by how humans and animals learn. The method depends on fragmentation and recall: an agent fragments an environment based on surprisal, and uses different local curiosity modules (prediction-based intrinsic reward functions) for each fragment so that modules are not trained on the entire environment. At each fragmentation event, the agent stores the current module in long-term memory (LTM) and either initializes a new module or recalls a previously stored module based on its match with the current state. With fragmentation and recall, FARCuriosity achieves less forgetting and better overall performance in games with varied and heterogeneous environments in the Atari benchmark suite of tasks. Thus, this work highlights the problem of catastrophic forgetting in prediction-based curiosity methods and proposes a solution.
    Efficient Numerical Algorithm for Large-Scale Damped Natural Gradient Descent. (arXiv:2310.17556v1 [cs.LG])
    We propose a new algorithm for efficiently solving the damped Fisher matrix in large-scale scenarios where the number of parameters significantly exceeds the number of available samples. This problem is fundamental for natural gradient descent and stochastic reconfiguration. Our algorithm is based on Cholesky decomposition and is generally applicable. Benchmark results show that the algorithm is significantly faster than existing methods.
    Fine-tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning. (arXiv:2203.09249v2 [cs.LG] UPDATED)
    Federated Learning (FL) is an emerging distributed learning paradigm under privacy constraint. Data heterogeneity is one of the main challenges in FL, which results in slow convergence and degraded performance. Most existing approaches only tackle the heterogeneity challenge by restricting the local model update in client, ignoring the performance drop caused by direct global model aggregation. Instead, we propose a data-free knowledge distillation method to fine-tune the global model in the server (FedFTG), which relieves the issue of direct model aggregation. Concretely, FedFTG explores the input space of local models through a generator, and uses it to transfer the knowledge from local models to the global model. Besides, we propose a hard sample mining scheme to achieve effective knowledge distillation throughout the training. In addition, we develop customized label sampling and class-level ensemble to derive maximum utilization of knowledge, which implicitly mitigates the distribution discrepancy across clients. Extensive experiments show that our FedFTG significantly outperforms the state-of-the-art (SOTA) FL algorithms and can serve as a strong plugin for enhancing FedAvg, FedProx, FedDyn, and SCAFFOLD.
    Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals. (arXiv:2302.04449v3 [cs.LG] UPDATED)
    High sample complexity has long been a challenge for RL. On the other hand, humans learn to perform tasks not only from interaction or demonstrations, but also by reading unstructured text documents, e.g., instruction manuals. Instruction manuals and wiki pages are among the most abundant data that could inform agents of valuable features and policies or task-specific environmental dynamics and reward structures. Therefore, we hypothesize that the ability to utilize human-written instruction manuals to assist learning policies for specific tasks should lead to a more efficient and better-performing agent. We propose the Read and Reward framework. Read and Reward speeds up RL algorithms on Atari games by reading manuals released by the Atari game developers. Our framework consists of a QA Extraction module that extracts and summarizes relevant information from the manual and a Reasoning module that evaluates object-agent interactions based on information from the manual. An auxiliary reward is then provided to a standard A2C RL agent, when interaction is detected. Experimentally, various RL algorithms obtain significant improvement in performance and training speed when assisted by our design.
    SoK: Pitfalls in Evaluating Black-Box Attacks. (arXiv:2310.17534v1 [cs.CR])
    Numerous works study black-box attacks on image classifiers. However, these works make different assumptions on the adversary's knowledge and current literature lacks a cohesive organization centered around the threat model. To systematize knowledge in this area, we propose a taxonomy over the threat space spanning the axes of feedback granularity, the access of interactive queries, and the quality and quantity of the auxiliary data available to the attacker. Our new taxonomy provides three key insights. 1) Despite extensive literature, numerous under-explored threat spaces exist, which cannot be trivially solved by adapting techniques from well-explored settings. We demonstrate this by establishing a new state-of-the-art in the less-studied setting of access to top-k confidence scores by adapting techniques from well-explored settings of accessing the complete confidence vector, but show how it still falls short of the more restrictive setting that only obtains the prediction label, highlighting the need for more research. 2) Identification the threat model of different attacks uncovers stronger baselines that challenge prior state-of-the-art claims. We demonstrate this by enhancing an initially weaker baseline (under interactive query access) via surrogate models, effectively overturning claims in the respective paper. 3) Our taxonomy reveals interactions between attacker knowledge that connect well to related areas, such as model inversion and extraction attacks. We discuss how advances in other areas can enable potentially stronger black-box attacks. Finally, we emphasize the need for a more realistic assessment of attack success by factoring in local attack runtime. This approach reveals the potential for certain attacks to achieve notably higher success rates and the need to evaluate attacks in diverse and harder settings, highlighting the need for better selection criteria.
    Little Exploration is All You Need. (arXiv:2310.17538v1 [cs.LG])
    The prevailing principle of "Optimism in the Face of Uncertainty" advocates for the incorporation of an exploration bonus, generally assumed to be proportional to the inverse square root of the visit count ($1/\sqrt{n}$), where $n$ is the number of visits to a particular state-action pair. This approach, however, exclusively focuses on "uncertainty," neglecting the inherent "difficulty" of different options. To address this gap, we introduce a novel modification of standard UCB algorithm in the multi-armed bandit problem, proposing an adjusted bonus term of $1/n^\tau$, where $\tau > 1/2$, that accounts for task difficulty. Our proposed algorithm, denoted as UCB$^\tau$, is substantiated through comprehensive regret and risk analyses, confirming its theoretical robustness. Comparative evaluations with standard UCB and Thompson Sampling algorithms on synthetic datasets demonstrate that UCB$^\tau$ not only outperforms in efficacy but also exhibits lower risk across various environmental conditions and hyperparameter settings.
    Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions. (arXiv:2310.17502v1 [cs.SD])
    Customizing voice and speaking style in a speech synthesis system with intuitive and fine-grained controls is challenging, given that little data with appropriate labels is available. Furthermore, editing an existing human's voice also comes with ethical concerns. In this paper, we propose a method to generate artificial speaker embeddings that cannot be linked to a real human while offering intuitive and fine-grained control over the voice and speaking style of the embeddings, without requiring any labels for speaker or style. The artificial and controllable embeddings can be fed to a speech synthesis system, conditioned on embeddings of real humans during training, without sacrificing privacy during inference.
    Cross-feature Contrastive Loss for Decentralized Deep Learning on Heterogeneous Data. (arXiv:2310.15890v2 [cs.LG] UPDATED)
    The current state-of-the-art decentralized learning algorithms mostly assume the data distribution to be Independent and Identically Distributed (IID). However, in practical scenarios, the distributed datasets can have significantly heterogeneous data distributions across the agents. In this work, we present a novel approach for decentralized learning on heterogeneous data, where data-free knowledge distillation through contrastive loss on cross-features is utilized to improve performance. Cross-features for a pair of neighboring agents are the features (i.e., last hidden layer activations) obtained from the data of an agent with respect to the model parameters of the other agent. We demonstrate the effectiveness of the proposed technique through an exhaustive set of experiments on various Computer Vision datasets (CIFAR-10, CIFAR-100, Fashion MNIST, Imagenette, and ImageNet), model architectures, and network topologies. Our experiments show that the proposed method achieves superior performance (0.2-4% improvement in test accuracy) compared to other existing techniques for decentralized learning on heterogeneous data.
    Improving Few-Shot Learning through Multi-task Representation Learning Theory. (arXiv:2010.01992v3 [cs.LG] CROSS LISTED)
    In this paper, we consider the framework of multi-task representation (MTR) learning where the goal is to use source tasks to learn a representation that reduces the sample complexity of solving a target task. We start by reviewing recent advances in MTR theory and show that they can provide novel insights for popular meta-learning algorithms when analyzed within this framework. In particular, we highlight a fundamental difference between gradient-based and metric-based algorithms in practice and put forward a theoretical analysis to explain it. Finally, we use the derived insights to improve the performance of meta-learning methods via a new spectral-based regularization term and confirm its efficiency through experimental studies on few-shot classification benchmarks. To the best of our knowledge, this is the first contribution that puts the most recent learning bounds of MTR theory into practice for the task of few-shot classification.
    StochGradAdam: Accelerating Neural Networks Training with Stochastic Gradient Sampling. (arXiv:2310.17042v1 [cs.LG])
    In the rapidly advancing domain of deep learning optimization, this paper unveils the StochGradAdam optimizer, a novel adaptation of the well-regarded Adam algorithm. Central to StochGradAdam is its gradient sampling technique. This method not only ensures stable convergence but also leverages the advantages of selective gradient consideration, fostering robust training by potentially mitigating the effects of noisy or outlier data and enhancing the exploration of the loss landscape for more dependable convergence. In both image classification and segmentation tasks, StochGradAdam has demonstrated superior performance compared to the traditional Adam optimizer. By judiciously sampling a subset of gradients at each iteration, the optimizer is optimized for managing intricate models. The paper provides a comprehensive exploration of StochGradAdam's methodology, from its mathematical foundations to bias correction strategies, heralding a promising advancement in deep learning training techniques.
    FedPEAT: Convergence of Federated Learning, Parameter-Efficient Fine Tuning, and Emulator Assisted Tuning for Artificial Intelligence Foundation Models with Mobile Edge Computing. (arXiv:2310.17491v1 [cs.LG])
    The emergence of foundation models, including language and vision models, has reshaped AI's landscape, offering capabilities across various applications. Deploying and fine-tuning these large models, like GPT-3 and BERT, presents challenges, especially in the current foundation model era. We introduce Emulator-Assisted Tuning (EAT) combined with Parameter-Efficient Fine-Tuning (PEFT) to form Parameter-Efficient Emulator-Assisted Tuning (PEAT). Further, we expand this into federated learning as Federated PEAT (FedPEAT). FedPEAT uses adapters, emulators, and PEFT for federated model tuning, enhancing model privacy and memory efficiency. Adapters adjust pre-trained models, while emulators give a compact representation of original models, addressing both privacy and efficiency. Adaptable to various neural networks, our approach also uses deep reinforcement learning for hyper-parameter optimization. We tested FedPEAT in a unique scenario with a server participating in collaborative federated tuning, showcasing its potential in tackling foundation model challenges.
    IDENAS: Internal Dependency Exploration for Neural Architecture Search. (arXiv:2310.17250v1 [cs.LG])
    Machine learning is a powerful tool for extracting valuable information and making various predictions from diverse datasets. Traditional algorithms rely on well-defined input and output variables however, there are scenarios where the distinction between the input and output variables and the underlying, associated (input and output) layers of the model, are unknown. Neural Architecture Search (NAS) and Feature Selection have emerged as promising solutions in such scenarios. This research proposes IDENAS, an Internal Dependency-based Exploration for Neural Architecture Search, integrating NAS with feature selection. The methodology explores internal dependencies in the complete parameter space for classification involving 1D sensor and 2D image data as well. IDENAS employs a modified encoder-decoder model and the Sequential Forward Search (SFS) algorithm, combining input-output configuration search with embedded feature selection. Experimental results demonstrate IDENASs superior performance in comparison to other algorithms, showcasing its effectiveness in model development pipelines and automated machine learning. On average, IDENAS achieved significant modelling improvements, underscoring its significant contribution to advancing the state-of-the-art in neural architecture search and feature selection integration.
    Diagnosing Alzheimer's Disease using Early-Late Multimodal Data Fusion with Jacobian Maps. (arXiv:2310.16936v1 [cs.CV])
    Alzheimer's disease (AD) is a prevalent and debilitating neurodegenerative disorder impacting a large aging population. Detecting AD in all its presymptomatic and symptomatic stages is crucial for early intervention and treatment. An active research direction is to explore machine learning methods that harness multimodal data fusion to outperform human inspection of medical scans. However, existing multimodal fusion models have limitations, including redundant computation, complex architecture, and simplistic handling of missing data. Moreover, the preprocessing pipelines of medical scans remain inadequately detailed and are seldom optimized for individual subjects. In this paper, we propose an efficient early-late fusion (ELF) approach, which leverages a convolutional neural network for automated feature extraction and random forests for their competitive performance on small datasets. Additionally, we introduce a robust preprocessing pipeline that adapts to the unique characteristics of individual subjects and makes use of whole brain images rather than slices or patches. Moreover, to tackle the challenge of detecting subtle changes in brain volume, we transform images into the Jacobian domain (JD) to enhance both accuracy and robustness in our classification. Using MRI and CT images from the OASIS-3 dataset, our experiments demonstrate the effectiveness of the ELF approach in classifying AD into four stages with an accuracy of 97.19%.
    Enhancing Energy-efficiency by Solving the Throughput Bottleneck of LSTM Cells for Embedded FPGAs. (arXiv:2310.16842v1 [cs.AR])
    To process sensor data in the Internet of Things(IoTs), embedded deep learning for 1-dimensional data is an important technique. In the past, CNNs were frequently used because they are simple to optimise for special embedded hardware such as FPGAs. This work proposes a novel LSTM cell optimisation aimed at energy-efficient inference on end devices. Using the traffic speed prediction as a case study, a vanilla LSTM model with the optimised LSTM cell achieves 17534 inferences per second while consuming only 3.8 $\mu$J per inference on the FPGA \textit{XC7S15} from \textit{Spartan-7} family. It achieves at least 5.4$\times$ faster throughput and 1.37$\times$ more energy efficient than existing approaches.
    Pre-RMSNorm and Pre-CRMSNorm Transformers: Equivalent and Efficient Pre-LN Transformers. (arXiv:2305.14858v2 [cs.LG] UPDATED)
    Transformers have achieved great success in machine learning applications. Normalization techniques, such as Layer Normalization (LayerNorm, LN) and Root Mean Square Normalization (RMSNorm), play a critical role in accelerating and stabilizing the training of Transformers. While LayerNorm recenters and rescales input vectors, RMSNorm only rescales the vectors by their RMS value. Despite being more computationally efficient, RMSNorm may compromise the representation ability of Transformers. There is currently no consensus regarding the preferred normalization technique, as some models employ LayerNorm while others utilize RMSNorm, especially in recent large language models. It is challenging to convert Transformers with one normalization to the other type. While there is an ongoing disagreement between the two normalization types, we propose a solution to unify two mainstream Transformer architectures, Pre-LN and Pre-RMSNorm Transformers. By removing the inherent redundant mean information in the main branch of Pre-LN Transformers, we can reduce LayerNorm to RMSNorm, achieving higher efficiency. We further propose the Compressed RMSNorm (CRMSNorm) and Pre-CRMSNorm Transformer based on a lossless compression of the zero-mean vectors. We formally establish the equivalence of Pre-LN, Pre-RMSNorm, and Pre-CRMSNorm Transformer variants in both training and inference. It implies that Pre-LN Transformers can be substituted with Pre-(C)RMSNorm counterparts at almost no cost, offering the same arithmetic functionality along with free efficiency improvement. Experiments demonstrate that we can reduce the training and inference time of Pre-LN Transformers by 1% - 10%.
    Can Differentiable Decision Trees Learn Interpretable Reward Functions?. (arXiv:2306.13004v3 [cs.LG] UPDATED)
    There is an increasing interest in learning reward functions that model human preferences. However, many frameworks use blackbox learning methods that, while expressive, are difficult to interpret. We propose and evaluate a novel approach for learning expressive and interpretable reward functions from preferences using Differentiable Decision Trees (DDTs). Our experiments across several domains, including CartPole, Visual Gridworld environments and Atari games, provide evidence that that the tree structure of our learned reward function is useful in determining the extent to which the reward function is aligned with human preferences. We provide experimental evidence that reward DDTs can achieve competitive performance when compared with larger capacity deep neural network reward functions. We also observe that the choice between soft and hard (argmax) output of reward DDT reveals a tension between wanting highly shaped rewards to ensure good RL performance, while also wanting simpler, more interpretable rewards.
    Sky Imager-Based Forecast of Solar Irradiance Using Machine Learning. (arXiv:2310.17356v1 [cs.CV])
    Ahead-of-time forecasting of the output power of power plants is essential for the stability of the electricity grid and ensuring uninterrupted service. However, forecasting renewable energy sources is difficult due to the chaotic behavior of natural energy sources. This paper presents a new approach to estimate short-term solar irradiance from sky images. The~proposed algorithm extracts features from sky images and use learning-based techniques to estimate the solar irradiance. The~performance of proposed machine learning (ML) algorithm is evaluated using two publicly available datasets of sky images. The~datasets contain over 350,000 images for an interval of 16 years, from 2004 to 2020, with the corresponding global horizontal irradiance (GHI) of each image as the ground truth. Compared to the state-of-the-art computationally heavy algorithms proposed in the literature, our approach achieves competitive results with much less computational complexity for both nowcasting and forecasting up to 4 h ahead of time.
    Unleashing the potential of GNNs via Bi-directional Knowledge Transfer. (arXiv:2310.17132v1 [cs.LG])
    Based on the message-passing paradigm, there has been an amount of research proposing diverse and impressive feature propagation mechanisms to improve the performance of GNNs. However, less focus has been put on feature transformation, another major operation of the message-passing framework. In this paper, we first empirically investigate the performance of the feature transformation operation in several typical GNNs. Unexpectedly, we notice that GNNs do not completely free up the power of the inherent feature transformation operation. By this observation, we propose the Bi-directional Knowledge Transfer (BiKT), a plug-and-play approach to unleash the potential of the feature transformation operations without modifying the original architecture. Taking the feature transformation operation as a derived representation learning model that shares parameters with the original GNN, the direct prediction by this model provides a topological-agnostic knowledge feedback that can further instruct the learning of GNN and the feature transformations therein. On this basis, BiKT not only allows us to acquire knowledge from both the GNN and its derived model but promotes each other by injecting the knowledge into the other. In addition, a theoretical analysis is further provided to demonstrate that BiKT improves the generalization bound of the GNNs from the perspective of domain adaption. An extensive group of experiments on up to 7 datasets with 5 typical GNNs demonstrates that BiKT brings up to 0.5% - 4% performance gain over the original GNN, which means a boosted GNN is obtained. Meanwhile, the derived model also shows a powerful performance to compete with or even surpass the original GNN, enabling us to flexibly apply it independently to some other specific downstream tasks.
    Effective Targeted Attacks for Adversarial Self-Supervised Learning. (arXiv:2210.10482v2 [cs.LG] UPDATED)
    Recently, unsupervised adversarial training (AT) has been highlighted as a means of achieving robustness in models without any label information. Previous studies in unsupervised AT have mostly focused on implementing self-supervised learning (SSL) frameworks, which maximize the instance-wise classification loss to generate adversarial examples. However, we observe that simply maximizing the self-supervised training loss with an untargeted adversarial attack often results in generating ineffective adversaries that may not help improve the robustness of the trained model, especially for non-contrastive SSL frameworks without negative examples. To tackle this problem, we propose a novel positive mining for targeted adversarial attack to generate effective adversaries for adversarial SSL frameworks. Specifically, we introduce an algorithm that selects the most confusing yet similar target example for a given instance based on entropy and similarity, and subsequently perturbs the given instance towards the selected target. Our method demonstrates significant enhancements in robustness when applied to non-contrastive SSL frameworks, and less but consistent robustness improvements with contrastive SSL frameworks, on the benchmark datasets.
    Learning Space-Time Continuous Neural PDEs from Partially Observed States. (arXiv:2307.04110v2 [cs.LG] UPDATED)
    We introduce a novel grid-independent model for learning partial differential equations (PDEs) from noisy and partial observations on irregular spatiotemporal grids. We propose a space-time continuous latent neural PDE model with an efficient probabilistic framework and a novel encoder design for improved data efficiency and grid independence. The latent state dynamics are governed by a PDE model that combines the collocation method and the method of lines. We employ amortized variational inference for approximate posterior estimation and utilize a multiple shooting technique for enhanced training speed and stability. Our model demonstrates state-of-the-art performance on complex synthetic and real-world datasets, overcoming limitations of previous approaches and effectively handling partially-observed data. The proposed model outperforms recent methods, showing its potential to advance data-driven PDE modeling and enabling robust, grid-independent modeling of complex partially-observed dynamic processes.
    Learning an Inventory Control Policy with General Inventory Arrival Dynamics. (arXiv:2310.17168v1 [cs.LG])
    In this paper we address the problem of learning and backtesting inventory control policies in the presence of general arrival dynamics -- which we term as a quantity-over-time arrivals model (QOT). We also allow for order quantities to be modified as a post-processing step to meet vendor constraints such as order minimum and batch size constraints -- a common practice in real supply chains. To the best of our knowledge this is the first work to handle either arbitrary arrival dynamics or an arbitrary downstream post-processing of order quantities. Building upon recent work (Madeka et al., 2022) we similarly formulate the periodic review inventory control problem as an exogenous decision process, where most of the state is outside the control of the agent. Madeka et al. (2022) show how to construct a simulator that replays historic data to solve this class of problem. In our case, we incorporate a deep generative model for the arrivals process as part of the history replay. By formulating the problem as an exogenous decision process, we can apply results from Madeka et al. (2022) to obtain a reduction to supervised learning. Finally, we show via simulation studies that this approach yields statistically significant improvements in profitability over production baselines. Using data from an ongoing real-world A/B test, we show that Gen-QOT generalizes well to off-policy data.
    A Theory of Link Prediction via Relational Weisfeiler-Leman on Knowledge Graphs. (arXiv:2302.02209v4 [cs.LG] UPDATED)
    Graph neural networks are prominent models for representation learning over graph-structured data. While the capabilities and limitations of these models are well-understood for simple graphs, our understanding remains incomplete in the context of knowledge graphs. Our goal is to provide a systematic understanding of the landscape of graph neural networks for knowledge graphs pertaining to the prominent task of link prediction. Our analysis entails a unifying perspective on seemingly unrelated models and unlocks a series of other models. The expressive power of various models is characterized via a corresponding relational Weisfeiler-Leman algorithm. This analysis is extended to provide a precise logical characterization of the class of functions captured by a class of graph neural networks. The theoretical findings presented in this paper explain the benefits of some widely employed practical design choices, which are validated empirically.
    Exploring the Trie of Rules: a fast data structure for the representation of association rules. (arXiv:2310.17355v1 [cs.LG])
    Association rule mining techniques can generate a large volume of sequential data when implemented on transactional databases. Extracting insights from a large set of association rules has been found to be a challenging process. When examining a ruleset, the fundamental question is how to summarise and represent meaningful mined knowledge efficiently. Many algorithms and strategies have been developed to address issue of knowledge extraction; however, the effectiveness of this process can be limited by the data structures. A better data structure can sufficiently affect the speed of the knowledge extraction process. This paper proposes a novel data structure, called the Trie of rules, for storing a ruleset that is generated by association rule mining. The resulting data structure is a prefix-tree graph structure made of pre-mined rules. This graph stores the rules as paths within the prefix-tree in a way that similar rules overlay each other. Each node in the tree represents a rule where a consequent is this node, and an antecedent is a path from this node to the root of the tree. The evaluation showed that the proposed representation technique is promising. It compresses a ruleset with almost no data loss and benefits in terms of time for basic operations such as searching for a specific rule and sorting, which is the base for many knowledge discovery methods. Moreover, our method demonstrated a significant improvement in traversing time, achieving an 8-fold increase compared to traditional data structures.
    Universal Test-time Adaptation through Weight Ensembling, Diversity Weighting, and Prior Correction. (arXiv:2306.00650v2 [cs.CV] UPDATED)
    Since distribution shifts are likely to occur during test-time and can drastically decrease the model's performance, online test-time adaptation (TTA) continues to update the model after deployment, leveraging the current test data. Clearly, a method proposed for online TTA has to perform well for all kinds of environmental conditions. By introducing the variable factors domain non-stationarity and temporal correlation, we first unfold all practically relevant settings and define the entity as universal TTA. We want to highlight that this is the first work that covers such a broad spectrum, which is indispensable for the use in practice. To tackle the problem of universal TTA, we identify and highlight several challenges a self-training based method has to deal with: 1) model bias and the occurrence of trivial solutions when performing entropy minimization on varying sequence lengths with and without multiple domain shifts, 2) loss of generalization which exacerbates the adaptation to multiple domain shifts and the occurrence of catastrophic forgetting, and 3) performance degradation due to shifts in class prior. To prevent the model from becoming biased, we leverage a dataset and model-agnostic certainty and diversity weighting. In order to maintain generalization and prevent catastrophic forgetting, we propose to continually weight-average the source and adapted model. To compensate for disparities in the class prior during test-time, we propose an adaptive prior correction scheme that reweights the model's predictions. We evaluate our approach, named ROID, on a wide range of settings, datasets, and models, setting new standards in the field of universal TTA. Code is available at: https://github.com/mariodoebler/test-time-adaptation
    STEER: Semantic Turn Extension-Expansion Recognition for Voice Assistants. (arXiv:2310.16990v1 [cs.CL])
    In the context of a voice assistant system, steering refers to the phenomenon in which a user issues a follow-up command attempting to direct or clarify a previous turn. We propose STEER, a steering detection model that predicts whether a follow-up turn is a user's attempt to steer the previous command. Constructing a training dataset for steering use cases poses challenges due to the cold-start problem. To overcome this, we developed heuristic rules to sample opt-in usage data, approximating positive and negative samples without any annotation. Our experimental results show promising performance in identifying steering intent, with over 95% accuracy on our sampled data. Moreover, STEER, in conjunction with our sampling strategy, aligns effectively with real-world steering scenarios, as evidenced by its strong zero-shot performance on a human-graded evaluation set. In addition to relying solely on user transcripts as input, we introduce STEER+, an enhanced version of the model. STEER+ utilizes a semantic parse tree to provide more context on out-of-vocabulary words, such as named entities that often occur at the sentence boundary. This further improves model performance, reducing error rate in domains where entities frequently appear, such as messaging. Lastly, we present a data analysis that highlights the improvement in user experience when voice assistants support steering use cases.
    A Deep Learning Approach to Teeth Segmentation and Orientation from Panoramic X-rays. (arXiv:2310.17176v1 [cs.CV])
    Accurate teeth segmentation and orientation are fundamental in modern oral healthcare, enabling precise diagnosis, treatment planning, and dental implant design. In this study, we present a comprehensive approach to teeth segmentation and orientation from panoramic X-ray images, leveraging deep learning techniques. We build our model based on FUSegNet, a popular model originally developed for wound segmentation, and introduce modifications by incorporating grid-based attention gates into the skip connections. We introduce oriented bounding box (OBB) generation through principal component analysis (PCA) for precise tooth orientation estimation. Evaluating our approach on the publicly available DNS dataset, comprising 543 panoramic X-ray images, we achieve the highest Intersection-over-Union (IoU) score of 82.43% and Dice Similarity Coefficient (DSC) score of 90.37% among compared models in teeth instance segmentation. In OBB analysis, we obtain the Rotated IoU (RIoU) score of 82.82%. We also conduct detailed analyses of individual tooth labels and categorical performance, shedding light on strengths and weaknesses. The proposed model's accuracy and versatility offer promising prospects for improving dental diagnoses, treatment planning, and personalized healthcare in the oral domain. Our generated OBB coordinates and codes are available at https://github.com/mrinal054/Instance_teeth_segmentation.
    Emergent representations in networks trained with the Forward-Forward algorithm. (arXiv:2305.18353v2 [cs.NE] UPDATED)
    The Backpropagation algorithm has often been criticised for its lack of biological realism. In an attempt to find a more biologically plausible alternative, the recently introduced Forward-Forward algorithm replaces the forward and backward passes of Backpropagation with two forward passes. In this work, we show that the internal representations obtained by the Forward-Forward algorithm can organise into category-specific ensembles exhibiting high sparsity - i.e. composed of an extremely low number of active units. This situation is reminiscent of what has been observed in cortical sensory areas, where neuronal ensembles are suggested to serve as the functional building blocks for perception and action. Interestingly, while this sparse pattern does not typically arise in models trained with standard Backpropagation, it can emerge in networks trained with Backpropagation on the same objective proposed for the Forward-Forward algorithm. These results suggest that the learning procedure proposed by Forward-Forward may be superior to Backpropagation in modelling learning in the cortex, even when a backward pass is used.
    Unconstrained Dynamic Regret via Sparse Coding. (arXiv:2301.13349v5 [cs.LG] UPDATED)
    Motivated by the challenge of nonstationarity in sequential decision making, we study Online Convex Optimization (OCO) under the coupling of two problem structures: the domain is unbounded, and the comparator sequence $u_1,\ldots,u_T$ is arbitrarily time-varying. As no algorithm can guarantee low regret simultaneously against all comparator sequences, handling this setting requires moving from minimax optimality to comparator adaptivity. That is, sensible regret bounds should depend on certain complexity measures of the comparator relative to one's prior knowledge. This paper achieves a new type of these adaptive regret bounds via a sparse coding framework. The complexity of the comparator is measured by its energy and its sparsity on a user-specified dictionary, which offers considerable versatility. Equipped with a wavelet dictionary for example, our framework improves the state-of-the-art bound (Jacobsen & Cutkosky, 2022) by adapting to both ($i$) the magnitude of the comparator average $||\bar u||=||\sum_{t=1}^Tu_t/T||$, rather than the maximum $\max_t||u_t||$; and ($ii$) the comparator variability $\sum_{t=1}^T||u_t-\bar u||$, rather than the uncentered sum $\sum_{t=1}^T||u_t||$. Furthermore, our analysis is simpler due to decoupling function approximation from regret minimization.
    An Optimal and Scalable Matrix Mechanism for Noisy Marginals under Convex Loss Functions. (arXiv:2305.08175v2 [cs.DB] UPDATED)
    Noisy marginals are a common form of confidentiality-protecting data release and are useful for many downstream tasks such as contingency table analysis, construction of Bayesian networks, and even synthetic data generation. Privacy mechanisms that provide unbiased noisy answers to linear queries (such as marginals) are known as matrix mechanisms. We propose ResidualPlanner, a matrix mechanism for marginals with Gaussian noise that is both optimal and scalable. ResidualPlanner can optimize for many loss functions that can be written as a convex function of marginal variances (prior work was restricted to just one predefined objective function). ResidualPlanner can optimize the accuracy of marginals in large scale settings in seconds, even when the previous state of the art (HDMM) runs out of memory. It even runs on datasets with 100 attributes in a couple of minutes. Furthermore ResidualPlanner can efficiently compute variance/covariance values for each marginal (prior methods quickly run out of memory, even for relatively small datasets).
    Efficient Diffusion Policies for Offline Reinforcement Learning. (arXiv:2305.20081v2 [cs.LG] UPDATED)
    Offline reinforcement learning (RL) aims to learn optimal policies from offline datasets, where the parameterization of policies is crucial but often overlooked. Recently, Diffsuion-QL significantly boosts the performance of offline RL by representing a policy with a diffusion model, whose success relies on a parametrized Markov Chain with hundreds of steps for sampling. However, Diffusion-QL suffers from two critical limitations. 1) It is computationally inefficient to forward and backward through the whole Markov chain during training. 2) It is incompatible with maximum likelihood-based RL algorithms (e.g., policy gradient methods) as the likelihood of diffusion models is intractable. Therefore, we propose efficient diffusion policy (EDP) to overcome these two challenges. EDP approximately constructs actions from corrupted ones at training to avoid running the sampling chain. We conduct extensive experiments on the D4RL benchmark. The results show that EDP can reduce the diffusion policy training time from 5 days to 5 hours on gym-locomotion tasks. Moreover, we show that EDP is compatible with various offline RL algorithms (TD3, CRR, and IQL) and achieves new state-of-the-art on D4RL by large margins over previous methods. Our code is available at https://github.com/sail-sg/edp.
    Multitask Online Learning: Listen to the Neighborhood Buzz. (arXiv:2310.17385v1 [cs.LG])
    We study multitask online learning in a setting where agents can only exchange information with their neighbors on an arbitrary communication network. We introduce $\texttt{MT-CO}_2\texttt{OL}$, a decentralized algorithm for this setting whose regret depends on the interplay between the task similarities and the network structure. Our analysis shows that the regret of $\texttt{MT-CO}_2\texttt{OL}$ is never worse (up to constants) than the bound obtained when agents do not share information. On the other hand, our bounds significantly improve when neighboring agents operate on similar tasks. In addition, we prove that our algorithm can be made differentially private with a negligible impact on the regret when the losses are linear. Finally, we provide experimental support for our theory.
    Sketch-and-Project Meets Newton Method: Global $\mathcal O(k^{-2})$ Convergence with Low-Rank Updates. (arXiv:2305.13082v2 [math.OC] UPDATED)
    In this paper, we propose the first sketch-and-project Newton method with fast $\mathcal O(k^{-2})$ global convergence rate for self-concordant functions. Our method, SGN, can be viewed in three ways: i) as a sketch-and-project algorithm projecting updates of Newton method, ii) as a cubically regularized Newton ethod in sketched subspaces, and iii) as a damped Newton method in sketched subspaces. SGN inherits best of all three worlds: cheap iteration costs of sketch-and-project methods, state-of-the-art $\mathcal O(k^{-2})$ global convergence rate of full-rank Newton-like methods and the algorithm simplicity of damped Newton methods. Finally, we demonstrate its comparable empirical performance to baseline algorithms.
    Curvature Filtrations for Graph Generative Model Evaluation. (arXiv:2301.12906v3 [cs.LG] UPDATED)
    Graph generative model evaluation necessitates understanding differences between graphs on the distributional level. This entails being able to harness salient attributes of graphs in an efficient manner. Curvature constitutes one such property that has recently proved its utility in characterising graphs. Its expressive properties, stability, and practical utility in model evaluation remain largely unexplored, however. We combine graph curvature descriptors with emerging methods from topological data analysis to obtain robust, expressive descriptors for evaluating graph generative models.
    Bootstrapping Vision-Language Learning with Decoupled Language Pre-training. (arXiv:2307.07063v3 [cs.CV] UPDATED)
    We present a novel methodology aimed at optimizing the application of frozen large language models (LLMs) for resource-intensive vision-language (VL) pre-training. The current paradigm uses visual features as prompts to guide language models, with a focus on determining the most relevant visual features for corresponding text. Our approach diverges by concentrating on the language component, specifically identifying the optimal prompts to align with visual features. We introduce the Prompt-Transformer (P-Former), a model that predicts these ideal prompts, which is trained exclusively on linguistic data, bypassing the need for image-text pairings. This strategy subtly bifurcates the end-to-end VL training process into an additional, separate stage. Our experiments reveal that our framework significantly enhances the performance of a robust image-to-text baseline (BLIP-2), and effectively narrows the performance gap between models trained with either 4M or 129M image-text pairs. Importantly, our framework is modality-agnostic and flexible in terms of architectural design, as validated by its successful application in a video learning task using varied base modules. The code will be made available at https://github.com/yiren-jian/BLIText.
    Counterfactual-Augmented Importance Sampling for Semi-Offline Policy Evaluation. (arXiv:2310.17146v1 [cs.LG])
    In applying reinforcement learning (RL) to high-stakes domains, quantitative and qualitative evaluation using observational data can help practitioners understand the generalization performance of new policies. However, this type of off-policy evaluation (OPE) is inherently limited since offline data may not reflect the distribution shifts resulting from the application of new policies. On the other hand, online evaluation by collecting rollouts according to the new policy is often infeasible, as deploying new policies in these domains can be unsafe. In this work, we propose a semi-offline evaluation framework as an intermediate step between offline and online evaluation, where human users provide annotations of unobserved counterfactual trajectories. While tempting to simply augment existing data with such annotations, we show that this naive approach can lead to biased results. Instead, we design a new family of OPE estimators based on importance sampling (IS) and a novel weighting scheme that incorporate counterfactual annotations without introducing additional bias. We analyze the theoretical properties of our approach, showing its potential to reduce both bias and variance compared to standard IS estimators. Our analyses reveal important practical considerations for handling biased, noisy, or missing annotations. In a series of proof-of-concept experiments involving bandits and a healthcare-inspired simulator, we demonstrate that our approach outperforms purely offline IS estimators and is robust to imperfect annotations. Our framework, combined with principled human-centered design of annotation solicitation, can enable the application of RL in high-stakes domains.
    Sequential Memory with Temporal Predictive Coding. (arXiv:2305.11982v2 [q-bio.NC] UPDATED)
    Forming accurate memory of sequential stimuli is a fundamental function of biological agents. However, the computational mechanism underlying sequential memory in the brain remains unclear. Inspired by neuroscience theories and recent successes in applying predictive coding (PC) to \emph{static} memory tasks, in this work we propose a novel PC-based model for \emph{sequential} memory, called \emph{temporal predictive coding} (tPC). We show that our tPC models can memorize and retrieve sequential inputs accurately with a biologically plausible neural implementation. Importantly, our analytical study reveals that tPC can be viewed as a classical Asymmetric Hopfield Network (AHN) with an implicit statistical whitening process, which leads to more stable performance in sequential memory tasks of structured inputs. Moreover, we find that tPC exhibits properties consistent with behavioral observations and theories in neuroscience, thereby strengthening its biological relevance. Our work establishes a possible computational mechanism underlying sequential memory in the brain that can also be theoretically interpreted using existing memory model frameworks.
    Taming Gradient Variance in Federated Learning with Networked Control Variates. (arXiv:2310.17200v1 [cs.LG])
    Federated learning, a decentralized approach to machine learning, faces significant challenges such as extensive communication overheads, slow convergence, and unstable improvements. These challenges primarily stem from the gradient variance due to heterogeneous client data distributions. To address this, we introduce a novel Networked Control Variates (FedNCV) framework for Federated Learning. We adopt the REINFORCE Leave-One-Out (RLOO) as a fundamental control variate unit in the FedNCV framework, implemented at both client and server levels. At the client level, the RLOO control variate is employed to optimize local gradient updates, mitigating the variance introduced by data samples. Once relayed to the server, the RLOO-based estimator further provides an unbiased and low-variance aggregated gradient, leading to robust global updates. This dual-side application is formalized as a linear combination of composite control variates. We provide a mathematical expression capturing this integration of double control variates within FedNCV and present three theoretical results with corresponding proofs. This unique dual structure equips FedNCV to address data heterogeneity and scalability issues, thus potentially paving the way for large-scale applications. Moreover, we tested FedNCV on six diverse datasets under a Dirichlet distribution with {\alpha} = 0.1, and benchmarked its performance against six SOTA methods, demonstrating its superiority.
    On the Identifiability and Interpretability of Gaussian Process Models. (arXiv:2310.17023v1 [stat.ML])
    In this paper, we critically examine the prevalent practice of using additive mixtures of Mat\'ern kernels in single-output Gaussian process (GP) models and explore the properties of multiplicative mixtures of Mat\'ern kernels for multi-output GP models. For the single-output case, we derive a series of theoretical results showing that the smoothness of a mixture of Mat\'ern kernels is determined by the least smooth component and that a GP with such a kernel is effectively equivalent to the least smooth kernel component. Furthermore, we demonstrate that none of the mixing weights or parameters within individual kernel components are identifiable. We then turn our attention to multi-output GP models and analyze the identifiability of the covariance matrix $A$ in the multiplicative kernel $K(x,y) = AK_0(x,y)$, where $K_0$ is a standard single output kernel such as Mat\'ern. We show that $A$ is identifiable up to a multiplicative constant, suggesting that multiplicative mixtures are well suited for multi-output tasks. Our findings are supported by extensive simulations and real applications for both single- and multi-output settings. This work provides insight into kernel selection and interpretation for GP models, emphasizing the importance of choosing appropriate kernel structures for different tasks.
    Quantum Long Short-Term Memory (QLSTM) vs Classical LSTM in Time Series Forecasting: A Comparative Study in Solar Power Forecasting. (arXiv:2310.17032v1 [quant-ph])
    Accurately forecasting solar power generation is crucial in the global progression towards sustainable energy systems. In this study, we conduct a meticulous comparison between Quantum Long Short-Term Memory (QLSTM) and classical Long Short-Term Memory (LSTM) models for solar power production forecasting. Our controlled experiments reveal promising advantages of QLSTMs, including accelerated training convergence and substantially reduced test loss within the initial epoch compared to classical LSTMs. These empirical findings demonstrate QLSTM's potential to swiftly assimilate complex time series relationships, enabled by quantum phenomena like superposition. However, realizing QLSTM's full capabilities necessitates further research into model validation across diverse conditions, systematic hyperparameter optimization, hardware noise resilience, and applications to correlated renewable forecasting problems. With continued progress, quantum machine learning can offer a paradigm shift in renewable energy time series prediction. This pioneering work provides initial evidence substantiating quantum advantages over classical LSTM, while acknowledging present limitations. Through rigorous benchmarking grounded in real-world data, our study elucidates a promising trajectory for quantum learning in renewable forecasting. Additional research and development can further actualize this potential to achieve unprecedented accuracy and reliability in predicting solar power generation worldwide.
    An Explainable Deep Learning-Based Method For Schizophrenia Diagnosis Using Generative Data-Augmentation. (arXiv:2310.16867v1 [cs.LG])
    In this study, we leverage a deep learning-based method for the automatic diagnosis of schizophrenia using EEG brain recordings. This approach utilizes generative data augmentation, a powerful technique that enhances the accuracy of the diagnosis. To enable the utilization of time-frequency features, spectrograms were extracted from the raw signals. After exploring several neural network architectural setups, a proper convolutional neural network (CNN) was used for the initial diagnosis. Subsequently, using Wasserstein GAN with Gradient Penalty (WGAN-GP) and Variational Autoencoder (VAE), two different synthetic datasets were generated in order to augment the initial dataset and address the over-fitting issue. The augmented dataset using VAE achieved a 3.0\% improvement in accuracy reaching up to 99.0\% and yielded a lower loss value as well as a faster convergence. Finally, we addressed the lack of trust in black-box models using the Local Interpretable Model-agnostic Explanations (LIME) algorithm to determine the most important superpixels (frequencies) in the diagnosis process.
    The Significance of Machine Learning in Clinical Disease Diagnosis: A Review. (arXiv:2310.16978v1 [cs.LG])
    The global need for effective disease diagnosis remains substantial, given the complexities of various disease mechanisms and diverse patient symptoms. To tackle these challenges, researchers, physicians, and patients are turning to machine learning (ML), an artificial intelligence (AI) discipline, to develop solutions. By leveraging sophisticated ML and AI methods, healthcare stakeholders gain enhanced diagnostic and treatment capabilities. However, there is a scarcity of research focused on ML algorithms for enhancing the accuracy and computational efficiency. This research investigates the capacity of machine learning algorithms to improve the transmission of heart rate data in time series healthcare metrics, concentrating particularly on optimizing accuracy and efficiency. By exploring various ML algorithms used in healthcare applications, the review presents the latest trends and approaches in ML-based disease diagnosis (MLBDD). The factors under consideration include the algorithm utilized, the types of diseases targeted, the data types employed, the applications, and the evaluation metrics. This review aims to shed light on the prospects of ML in healthcare, particularly in disease diagnosis. By analyzing the current literature, the study provides insights into state-of-the-art methodologies and their performance metrics.
    Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control. (arXiv:2309.14597v2 [cs.LG] UPDATED)
    Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In this work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy parameters leads to a wide range of returns. By taking a distributional view of these returns, we map the landscape, characterizing failure-prone regions of policy space and revealing a hidden dimension of policy quality. We show that the landscape exhibits surprising structure by finding simple paths in parameter space which improve the stability of a policy. To conclude, we develop a distribution-aware procedure which finds such paths, navigating away from noisy neighborhoods in order to improve the robustness of a policy. Taken together, our results provide new insight into the optimization, evaluation, and design of agents.
    Cause-Effect Inference in Location-Scale Noise Models: Maximum Likelihood vs. Independence Testing. (arXiv:2301.12930v3 [cs.LG] UPDATED)
    A fundamental problem of causal discovery is cause-effect inference, learning the correct causal direction between two random variables. Significant progress has been made through modelling the effect as a function of its cause and a noise term, which allows us to leverage assumptions about the generating function class. The recently introduced heteroscedastic location-scale noise functional models (LSNMs) combine expressive power with identifiability guarantees. LSNM model selection based on maximizing likelihood achieves state-of-the-art accuracy, when the noise distributions are correctly specified. However, through an extensive empirical evaluation, we demonstrate that the accuracy deteriorates sharply when the form of the noise distribution is misspecified by the user. Our analysis shows that the failure occurs mainly when the conditional variance in the anti-causal direction is smaller than that in the causal direction. As an alternative, we find that causal model selection through residual independence testing is much more robust to noise misspecification and misleading conditional variance.
    Local Advantage Networks for Cooperative Multi-Agent Reinforcement Learning. (arXiv:2112.12458v3 [cs.LG] UPDATED)
    Many recent successful off-policy multi-agent reinforcement learning (MARL) algorithms for cooperative partially observable environments focus on finding factorized value functions, leading to convoluted network structures. Building on the structure of independent Q-learners, our LAN algorithm takes a radically different approach, leveraging a dueling architecture to learn for each agent a decentralized best-response policies via individual advantage functions. The learning is stabilized by a centralized critic whose primary objective is to reduce the moving target problem of the individual advantages. The critic, whose network's size is independent of the number of agents, is cast aside after learning. Evaluation on the StarCraft II multi-agent challenge benchmark shows that LAN reaches state-of-the-art performance and is highly scalable with respect to the number of agents, opening up a promising alternative direction for MARL research.
    Oracle-Efficient Pessimism: Offline Policy Optimization in Contextual Bandits. (arXiv:2306.07923v2 [cs.LG] UPDATED)
    We consider offline policy optimization (OPO) in contextual bandits, where one is given a fixed dataset of logged interactions. While pessimistic regularizers are typically used to mitigate distribution shift, prior implementations thereof are either specialized or computationally inefficient. We present the first general oracle-efficient algorithm for pessimistic OPO: it reduces to supervised learning, leading to broad applicability. We obtain statistical guarantees analogous to those for prior pessimistic approaches. We instantiate our approach for both discrete and continuous actions and perform experiments in both settings, showing advantage over unregularized OPO across a wide range of configurations.
    BEVContrast: Self-Supervision in BEV Space for Automotive Lidar Point Clouds. (arXiv:2310.17281v1 [cs.CV])
    We present a surprisingly simple and efficient method for self-supervision of 3D backbone on automotive Lidar point clouds. We design a contrastive loss between features of Lidar scans captured in the same scene. Several such approaches have been proposed in the literature from PointConstrast, which uses a contrast at the level of points, to the state-of-the-art TARL, which uses a contrast at the level of segments, roughly corresponding to objects. While the former enjoys a great simplicity of implementation, it is surpassed by the latter, which however requires a costly pre-processing. In BEVContrast, we define our contrast at the level of 2D cells in the Bird's Eye View plane. Resulting cell-level representations offer a good trade-off between the point-level representations exploited in PointContrast and segment-level representations exploited in TARL: we retain the simplicity of PointContrast (cell representations are cheap to compute) while surpassing the performance of TARL in downstream semantic segmentation.
    Transformer-based Atmospheric Density Forecasting. (arXiv:2310.16912v1 [physics.ao-ph])
    As the peak of the solar cycle approaches in 2025 and the ability of a single geomagnetic storm to significantly alter the orbit of Resident Space Objects (RSOs), techniques for atmospheric density forecasting are vital for space situational awareness. While linear data-driven methods, such as dynamic mode decomposition with control (DMDc), have been used previously for forecasting atmospheric density, deep learning-based forecasting has the ability to capture nonlinearities in data. By learning multiple layer weights from historical atmospheric density data, long-term dependencies in the dataset are captured in the mapping between the current atmospheric density state and control input to the atmospheric density state at the next timestep. This work improves upon previous linear propagation methods for atmospheric density forecasting, by developing a nonlinear transformer-based architecture for atmospheric density forecasting. Empirical NRLMSISE-00 and JB2008, as well as physics-based TIEGCM atmospheric density models are compared for forecasting with DMDc and with the transformer-based propagator.  ( 2 min )
    Break it, Imitate it, Fix it: Robustness by Generating Human-Like Attacks. (arXiv:2310.16955v1 [cs.LG])
    Real-world natural language processing systems need to be robust to human adversaries. Collecting examples of human adversaries for training is an effective but expensive solution. On the other hand, training on synthetic attacks with small perturbations - such as word-substitution - does not actually improve robustness to human adversaries. In this paper, we propose an adversarial training framework that uses limited human adversarial examples to generate more useful adversarial examples at scale. We demonstrate the advantages of this system on the ANLI and hate speech detection benchmark datasets - both collected via an iterative, adversarial human-and-model-in-the-loop procedure. Compared to training only on observed human attacks, also training on our synthetic adversarial examples improves model robustness to future rounds. In ANLI, we see accuracy gains on the current set of attacks (44.1%$\,\to\,$50.1%) and on two future unseen rounds of human generated attacks (32.5%$\,\to\,$43.4%, and 29.4%$\,\to\,$40.2%). In hate speech detection, we see AUC gains on current attacks (0.76 $\to$ 0.84) and a future round (0.77 $\to$ 0.79). Attacks from methods that do not learn the distribution of existing human adversaries, meanwhile, degrade robustness.  ( 2 min )
    Probabilistic Integral Circuits. (arXiv:2310.16986v1 [cs.LG])
    Continuous latent variables (LVs) are a key ingredient of many generative models, as they allow modelling expressive mixtures with an uncountable number of components. In contrast, probabilistic circuits (PCs) are hierarchical discrete mixtures represented as computational graphs composed of input, sum and product units. Unlike continuous LV models, PCs provide tractable inference but are limited to discrete LVs with categorical (i.e. unordered) states. We bridge these model classes by introducing probabilistic integral circuits (PICs), a new language of computational graphs that extends PCs with integral units representing continuous LVs. In the first place, PICs are symbolic computational graphs and are fully tractable in simple cases where analytical integration is possible. In practice, we parameterise PICs with light-weight neural nets delivering an intractable hierarchical continuous mixture that can be approximated arbitrarily well with large PCs using numerical quadrature. On several distribution estimation benchmarks, we show that such PIC-approximating PCs systematically outperform PCs commonly learned via expectation-maximization or SGD.  ( 2 min )
    Improvement in Alzheimer's Disease MRI Images Analysis by Convolutional Neural Networks Via Topological Optimization. (arXiv:2310.16857v1 [eess.IV])
    This research underscores the efficacy of Fourier topological optimization in refining MRI imagery, thereby bolstering the classification precision of Alzheimer's Disease through convolutional neural networks. Recognizing that MRI scans are indispensable for neurological assessments, but frequently grapple with issues like blurriness and contrast irregularities, the deployment of Fourier topological optimization offered enhanced delineation of brain structures, ameliorated noise, and superior contrast. The applied techniques prioritized boundary enhancement, contrast and brightness adjustments, and overall image lucidity. Employing CNN architectures VGG16, ResNet50, InceptionV3, and Xception, the post-optimization analysis revealed a marked elevation in performance. Conclusively, the amalgamation of Fourier topological optimization with CNNs delineates a promising trajectory for the nuanced classification of Alzheimer's Disease, portending a transformative impact on its diagnostic paradigms.  ( 2 min )
    Squared Neural Families: A New Class of Tractable Density Models. (arXiv:2305.13552v2 [cs.LG] UPDATED)
    Flexible models for probability distributions are an essential ingredient in many machine learning tasks. We develop and investigate a new class of probability distributions, which we call a Squared Neural Family (SNEFY), formed by squaring the 2-norm of a neural network and normalising it with respect to a base measure. Following the reasoning similar to the well established connections between infinitely wide neural networks and Gaussian processes, we show that SNEFYs admit closed form normalising constants in many cases of interest, thereby resulting in flexible yet fully tractable density models. SNEFYs strictly generalise classical exponential families, are closed under conditioning, and have tractable marginal distributions. Their utility is illustrated on a variety of density estimation, conditional density estimation, and density estimation with missing data tasks.
    MimicTouch: Learning Human's Control Strategy with Multi-Modal Tactile Feedback. (arXiv:2310.16917v1 [cs.RO])
    In robotics and artificial intelligence, the integration of tactile processing is becoming increasingly pivotal, especially in learning to execute intricate tasks like alignment and insertion. However, existing works focusing on tactile methods for insertion tasks predominantly rely on robot teleoperation data and reinforcement learning, which do not utilize the rich insights provided by human's control strategy guided by tactile feedback. For utilizing human sensations, methodologies related to learning from humans predominantly leverage visual feedback, often overlooking the invaluable tactile feedback that humans inherently employ to finish complex manipulations. Addressing this gap, we introduce "MimicTouch", a novel framework that mimics human's tactile-guided control strategy. In this framework, we initially collect multi-modal tactile datasets from human demonstrators, incorporating human tactile-guided control strategies for task completion. The subsequent step involves instructing robots through imitation learning using multi-modal sensor data and retargeted human motions. To further mitigate the embodiment gap between humans and robots, we employ online residual reinforcement learning on the physical robot. Through comprehensive experiments, we validate the safety of MimicTouch in transferring a latent policy learned through imitation learning from human to robot. This ongoing work will pave the way for a broader spectrum of tactile-guided robotic applications.  ( 2 min )
    Zephyr: Direct Distillation of LM Alignment. (arXiv:2310.16944v1 [cs.LG])
    We aim to produce a smaller language model that is aligned to user intent. Previous research has shown that applying distilled supervised fine-tuning (dSFT) on larger models significantly improves task accuracy; however, these models are unaligned, i.e. they do not respond well to natural prompts. To distill this property, we experiment with the use of preference data from AI Feedback (AIF). Starting from a dataset of outputs ranked by a teacher model, we apply distilled direct preference optimization (dDPO) to learn a chat model with significantly improved intent alignment. The approach requires only a few hours of training without any additional sampling during fine-tuning. The final result, Zephyr-7B, sets the state-of-the-art on chat benchmarks for 7B parameter models, and requires no human annotation. In particular, results on MT-Bench show that Zephyr-7B surpasses Llama2-Chat-70B, the best open-access RLHF-based model. Code, models, data, and tutorials for the system are available at https://github.com/huggingface/alignment-handbook.  ( 2 min )
    Improving Few-shot Generalization of Safety Classifiers via Data Augmented Parameter-Efficient Fine-Tuning. (arXiv:2310.16959v1 [cs.LG])
    As large language models (LLMs) are widely adopted, new safety issues and policies emerge, to which existing safety classifiers do not generalize well. If we have only observed a few examples of violations of a new safety rule, how can we build a classifier to detect violations? In this paper, we study the novel setting of domain-generalized few-shot learning for LLM-based text safety classifiers. Unlike prior few-shot work, these new safety issues can be hard to uncover and we do not get to choose the few examples. We demonstrate that existing few-shot techniques do not perform well in this setting, and rather we propose to do parameter-efficient fine-tuning (PEFT) combined with augmenting training data based on similar examples in prior existing rules. We empirically show that our approach of similarity-based data-augmentation + prompt-tuning (DAPT) consistently outperforms baselines that either do not rely on data augmentation or on PEFT by 7-17% F1 score in the Social Chemistry moral judgement and 9-13% AUC in the Toxicity detection tasks, even when the new rule is loosely correlated with existing ones.  ( 2 min )
    Causal Q-Aggregation for CATE Model Selection. (arXiv:2310.16945v1 [stat.ML])
    Accurate estimation of conditional average treatment effects (CATE) is at the core of personalized decision making. While there is a plethora of models for CATE estimation, model selection is a nontrivial task, due to the fundamental problem of causal inference. Recent empirical work provides evidence in favor of proxy loss metrics with double robust properties and in favor of model ensembling. However, theoretical understanding is lacking. Direct application of prior theoretical work leads to suboptimal oracle model selection rates due to the non-convexity of the model selection problem. We provide regret rates for the major existing CATE ensembling approaches and propose a new CATE model ensembling approach based on Q-aggregation using the doubly robust loss. Our main result shows that causal Q-aggregation achieves statistically optimal oracle model selection regret rates of $\frac{\log(M)}{n}$ (with $M$ models and $n$ samples), with the addition of higher-order estimation error terms related to products of errors in the nuisance functions. Crucially, our regret rate does not require that any of the candidate CATE models be close to the truth. We validate our new method on many semi-synthetic datasets and also provide extensions of our work to CATE model selection with instrumental variables and unobserved confounding.  ( 2 min )
    Towards Continually Learning Application Performance Models. (arXiv:2310.16996v1 [cs.LG])
    Machine learning-based performance models are increasingly being used to build critical job scheduling and application optimization decisions. Traditionally, these models assume that data distribution does not change as more samples are collected over time. However, owing to the complexity and heterogeneity of production HPC systems, they are susceptible to hardware degradation, replacement, and/or software patches, which can lead to drift in the data distribution that can adversely affect the performance models. To this end, we develop continually learning performance models that account for the distribution drift, alleviate catastrophic forgetting, and improve generalizability. Our best model was able to retain accuracy, regardless of having to learn the new distribution of data inflicted by system changes, while demonstrating a 2x improvement in the prediction accuracy of the whole data sequence in comparison to the naive approach.  ( 2 min )
    Exploring Behavior Discovery Methods for Heterogeneous Swarms of Limited-Capability Robots. (arXiv:2310.16941v1 [cs.RO])
    We study the problem of determining the emergent behaviors that are possible given a functionally heterogeneous swarm of robots with limited capabilities. Prior work has considered behavior search for homogeneous swarms and proposed the use of novelty search over either a hand-specified or learned behavior space followed by clustering to return a taxonomy of emergent behaviors to the user. In this paper, we seek to better understand the role of novelty search and the efficacy of using clustering to discover novel emergent behaviors. Through a large set of experiments and ablations, we analyze the effect of representations, evolutionary search, and various clustering methods in the search for novel behaviors in a heterogeneous swarm. Our results indicate that prior methods fail to discover many interesting behaviors and that an iterative human-in-the-loop discovery process discovers more behaviors than random search, swarm chemistry, and automated behavior discovery. The combined discoveries of our experiments uncover 23 emergent behaviors, 18 of which are novel discoveries. To the best of our knowledge, these are the first known emergent behaviors for heterogeneous swarms of computation-free agents. Videos, code, and appendix are available at the project website: https://sites.google.com/view/heterogeneous-bd-methods  ( 2 min )
    General Point Model with Autoencoding and Autoregressive. (arXiv:2310.16861v1 [cs.LG])
    The pre-training architectures of large language models encompass various types, including autoencoding models, autoregressive models, and encoder-decoder models. We posit that any modality can potentially benefit from a large language model, as long as it undergoes vector quantization to become discrete tokens. Inspired by GLM, we propose a General Point Model (GPM) which seamlessly integrates autoencoding and autoregressive tasks in point cloud transformer. This model is versatile, allowing fine-tuning for downstream point cloud representation tasks, as well as unconditional and conditional generation tasks. GPM enhances masked prediction in autoencoding through various forms of mask padding tasks, leading to improved performance in point cloud understanding. Additionally, GPM demonstrates highly competitive results in unconditional point cloud generation tasks, even exhibiting the potential for conditional generation tasks by modifying the input's conditional information. Compared to models like Point-BERT, MaskPoint and PointMAE, our GPM achieves superior performance in point cloud understanding tasks. Furthermore, the integration of autoregressive and autoencoding within the same transformer underscores its versatility across different downstream tasks.  ( 2 min )
  • Open

    A Neural Collapse Perspective on Feature Evolution in Graph Neural Networks. (arXiv:2307.01951v2 [cs.LG] UPDATED)
    Graph neural networks (GNNs) have become increasingly popular for classification tasks on graph-structured data. Yet, the interplay between graph topology and feature evolution in GNNs is not well understood. In this paper, we focus on node-wise classification, illustrated with community detection on stochastic block model graphs, and explore the feature evolution through the lens of the "Neural Collapse" (NC) phenomenon. When training instance-wise deep classifiers (e.g. for image classification) beyond the zero training error point, NC demonstrates a reduction in the deepest features' within-class variability and an increased alignment of their class means to certain symmetric structures. We start with an empirical study that shows that a decrease in within-class variability is also prevalent in the node-wise classification setting, however, not to the extent observed in the instance-wise case. Then, we theoretically study this distinction. Specifically, we show that even an "optimistic" mathematical model requires that the graphs obey a strict structural condition in order to possess a minimizer with exact collapse. Interestingly, this condition is viable also for heterophilic graphs and relates to recent empirical studies on settings with improved GNNs' generalization. Furthermore, by studying the gradient dynamics of the theoretical model, we provide reasoning for the partial collapse observed empirically. Finally, we present a study on the evolution of within- and between-class feature variability across layers of a well-trained GNN and contrast the behavior with spectral methods.  ( 3 min )
    Assessing the overall and partial causal well-specification of nonlinear additive noise models. (arXiv:2310.16502v2 [stat.ME] UPDATED)
    We propose a method to detect model misspecifications in nonlinear causal additive and potentially heteroscedastic noise models. We aim to identify predictor variables for which we can infer the causal effect even in cases of such misspecification. We develop a general framework based on knowledge of the multivariate observational data distribution and we then propose an algorithm for finite sample data, discuss its asymptotic properties, and illustrate its performance on simulated and real data.
    Learning Rate Free Bayesian Inference in Constrained Domains. (arXiv:2305.14943v2 [stat.ML] UPDATED)
    We introduce a suite of new particle-based algorithms for sampling on constrained domains which are entirely learning rate free. Our approach leverages coin betting ideas from convex optimisation, and the viewpoint of constrained sampling as a mirrored optimisation problem on the space of probability measures. Based on this viewpoint, we also introduce a unifying framework for several existing constrained sampling algorithms, including mirrored Langevin dynamics and mirrored Stein variational gradient descent. We demonstrate the performance of our algorithms on a range of numerical examples, including sampling from targets on the simplex, sampling with fairness constraints, and constrained sampling problems in post-selection inference. Our results indicate that our algorithms achieve competitive performance with existing constrained sampling methods, without the need to tune any hyperparameters.
    Multi-scale Diffusion Denoised Smoothing. (arXiv:2310.16779v2 [cs.LG] UPDATED)
    Along with recent diffusion models, randomized smoothing has become one of a few tangible approaches that offers adversarial robustness to models at scale, e.g., those of large pre-trained models. Specifically, one can perform randomized smoothing on any classifier via a simple "denoise-and-classify" pipeline, so-called denoised smoothing, given that an accurate denoiser is available - such as diffusion model. In this paper, we present scalable methods to address the current trade-off between certified robustness and accuracy in denoised smoothing. Our key idea is to "selectively" apply smoothing among multiple noise scales, coined multi-scale smoothing, which can be efficiently implemented with a single diffusion model. This approach also suggests a new objective to compare the collective robustness of multi-scale smoothed classifiers, and questions which representation of diffusion model would maximize the objective. To address this, we propose to further fine-tune diffusion model (a) to perform consistent denoising whenever the original image is recoverable, but (b) to generate rather diverse outputs otherwise. Our experiments show that the proposed multi-scale smoothing scheme combined with diffusion fine-tuning enables strong certified robustness available with high noise level while maintaining its accuracy closer to non-smoothed classifiers.  ( 2 min )
    Robust Output Analysis with Monte-Carlo Methodology. (arXiv:2207.13612v3 [stat.ME] CROSS LISTED)
    In predictive modeling with simulation or machine learning, it is critical to accurately assess the quality of estimated values through output analysis. In recent decades output analysis has become enriched with methods that quantify the impact of input data uncertainty in the model outputs to increase robustness. However, most developments are applicable assuming that the input data adheres to a parametric family of distributions. We propose a unified output analysis framework for simulation and machine learning outputs through the lens of Monte Carlo sampling. This framework provides nonparametric quantification of the variance and bias induced in the outputs with higher-order accuracy. Our new bias-corrected estimation from the model outputs leverages the extension of fast iterative bootstrap sampling and higher-order influence functions. For the scalability of the proposed estimation methods, we devise budget-optimal rules and leverage control variates for variance reduction. Our theoretical and numerical results demonstrate a clear advantage in building more robust confidence intervals from the model outputs with higher coverage probability.  ( 2 min )
    A Mean Field Approach to Empirical Bayes Estimation in High-dimensional Linear Regression. (arXiv:2309.16843v2 [math.ST] UPDATED)
    We study empirical Bayes estimation in high-dimensional linear regression. To facilitate computationally efficient estimation of the underlying prior, we adopt a variational empirical Bayes approach, introduced originally in Carbonetto and Stephens (2012) and Kim et al. (2022). We establish asymptotic consistency of the nonparametric maximum likelihood estimator (NPMLE) and its (computable) naive mean field variational surrogate under mild assumptions on the design and the prior. Assuming, in addition, that the naive mean field approximation has a dominant optimizer, we develop a computationally efficient approximation to the oracle posterior distribution, and establish its accuracy under the 1-Wasserstein metric. This enables computationally feasible Bayesian inference; e.g., construction of posterior credible intervals with an average coverage guarantee, Bayes optimal estimation for the regression coefficients, estimation of the proportion of non-nulls, etc. Our analysis covers both deterministic and random designs, and accommodates correlations among the features. To the best of our knowledge, this provides the first rigorous nonparametric empirical Bayes method in a high-dimensional regression setting without sparsity.  ( 2 min )
    Statistically Valid Variable Importance Assessment through Conditional Permutations. (arXiv:2309.07593v2 [cs.LG] UPDATED)
    Variable importance assessment has become a crucial step in machine-learning applications when using complex learners, such as deep neural networks, on large-scale data. Removal-based importance assessment is currently the reference approach, particularly when statistical guarantees are sought to justify variable inclusion. It is often implemented with variable permutation schemes. On the flip side, these approaches risk misidentifying unimportant variables as important in the presence of correlations among covariates. Here we develop a systematic approach for studying Conditional Permutation Importance (CPI) that is model agnostic and computationally lean, as well as reusable benchmarks of state-of-the-art variable importance estimators. We show theoretically and empirically that $\textit{CPI}$ overcomes the limitations of standard permutation importance by providing accurate type-I error control. When used with a deep neural network, $\textit{CPI}$ consistently showed top accuracy across benchmarks. An experiment on real-world data analysis in a large-scale medical dataset showed that $\textit{CPI}$ provides a more parsimonious selection of statistically significant variables. Our results suggest that $\textit{CPI}$ can be readily used as drop-in replacement for permutation-based methods.  ( 3 min )
    Unconstrained Dynamic Regret via Sparse Coding. (arXiv:2301.13349v5 [cs.LG] UPDATED)
    Motivated by the challenge of nonstationarity in sequential decision making, we study Online Convex Optimization (OCO) under the coupling of two problem structures: the domain is unbounded, and the comparator sequence $u_1,\ldots,u_T$ is arbitrarily time-varying. As no algorithm can guarantee low regret simultaneously against all comparator sequences, handling this setting requires moving from minimax optimality to comparator adaptivity. That is, sensible regret bounds should depend on certain complexity measures of the comparator relative to one's prior knowledge. This paper achieves a new type of these adaptive regret bounds via a sparse coding framework. The complexity of the comparator is measured by its energy and its sparsity on a user-specified dictionary, which offers considerable versatility. Equipped with a wavelet dictionary for example, our framework improves the state-of-the-art bound (Jacobsen & Cutkosky, 2022) by adapting to both ($i$) the magnitude of the comparator average $||\bar u||=||\sum_{t=1}^Tu_t/T||$, rather than the maximum $\max_t||u_t||$; and ($ii$) the comparator variability $\sum_{t=1}^T||u_t-\bar u||$, rather than the uncentered sum $\sum_{t=1}^T||u_t||$. Furthermore, our analysis is simpler due to decoupling function approximation from regret minimization.  ( 3 min )
    Robust Covariate Shift Adaptation for Density-Ratio Estimation. (arXiv:2310.16638v2 [stat.ME] UPDATED)
    Consider a scenario where we have access to train data with both covariates and outcomes while test data only contains covariates. In this scenario, our primary aim is to predict the missing outcomes of the test data. With this objective in mind, we train parametric regression models under a covariate shift, where covariate distributions are different between the train and test data. For this problem, existing studies have proposed covariate shift adaptation via importance weighting using the density ratio. This approach averages the train data losses, each weighted by an estimated ratio of the covariate densities between the train and test data, to approximate the test-data risk. Although it allows us to obtain a test-data risk minimizer, its performance heavily relies on the accuracy of the density ratio estimation. Moreover, even if the density ratio can be consistently estimated, the estimation errors of the density ratio also yield bias in the estimators of the regression model's parameters of interest. To mitigate these challenges, we introduce a doubly robust estimator for covariate shift adaptation via importance weighting, which incorporates an additional estimator for the regression function. Leveraging double machine learning techniques, our estimator reduces the bias arising from the density ratio estimation errors. We demonstrate the asymptotic distribution of the regression parameter estimator. Notably, our estimator remains consistent if either the density ratio estimator or the regression function is consistent, showcasing its robustness against potential errors in density ratio estimation. Finally, we confirm the soundness of our proposed method via simulation studies.
    Cause-Effect Inference in Location-Scale Noise Models: Maximum Likelihood vs. Independence Testing. (arXiv:2301.12930v3 [cs.LG] UPDATED)
    A fundamental problem of causal discovery is cause-effect inference, learning the correct causal direction between two random variables. Significant progress has been made through modelling the effect as a function of its cause and a noise term, which allows us to leverage assumptions about the generating function class. The recently introduced heteroscedastic location-scale noise functional models (LSNMs) combine expressive power with identifiability guarantees. LSNM model selection based on maximizing likelihood achieves state-of-the-art accuracy, when the noise distributions are correctly specified. However, through an extensive empirical evaluation, we demonstrate that the accuracy deteriorates sharply when the form of the noise distribution is misspecified by the user. Our analysis shows that the failure occurs mainly when the conditional variance in the anti-causal direction is smaller than that in the causal direction. As an alternative, we find that causal model selection through residual independence testing is much more robust to noise misspecification and misleading conditional variance.  ( 2 min )
    Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory. (arXiv:2307.04204v2 [cs.LG] UPDATED)
    Cohen et al. (2021) empirically study the evolution of the largest eigenvalue of the loss Hessian, also known as sharpness, along the gradient descent (GD) trajectory and observe the Edge of Stability (EoS) phenomenon. The sharpness increases at the early phase of training (referred to as progressive sharpening), and eventually saturates close to the threshold of $2 / \text{(step size)}$. In this paper, we start by demonstrating through empirical studies that when the EoS phenomenon occurs, different GD trajectories (after a proper reparameterization) align on a specific bifurcation diagram independent of initialization. We then rigorously prove this trajectory alignment phenomenon for a two-layer fully-connected linear network and a single-neuron nonlinear network trained with a single data point. Our trajectory alignment analysis establishes both progressive sharpening and EoS phenomena, encompassing and extending recent findings in the literature.  ( 2 min )
    Statistical Component Separation for Targeted Signal Recovery in Noisy Mixtures. (arXiv:2306.15012v2 [stat.ML] UPDATED)
    Separating signals from an additive mixture may be an unnecessarily hard problem when one is only interested in specific properties of a given signal. In this work, we tackle simpler "statistical component separation" problems that focus on recovering a predefined set of statistical descriptors of a target signal from a noisy mixture. Assuming access to samples of the noise process, we investigate a method devised to match the statistics of the solution candidate corrupted by noise samples with those of the observed mixture. We first analyze the behavior of this method using simple examples with analytically tractable calculations. Then, we apply it in an image denoising context employing 1) wavelet-based descriptors, 2) ConvNet-based descriptors on astrophysics and ImageNet data. In the case of 1), we show that our method better recovers the descriptors of the target data than a standard denoising method in most situations. Additionally, despite not constructed for this purpose, it performs surprisingly well in terms of peak signal-to-noise ratio on full signal reconstruction. In comparison, representation 2) appears less suitable for image denoising. Finally, we extend this method by introducing a diffusive stepwise algorithm which gives a new perspective to the initial method and leads to promising results for image denoising under specific circumstances.  ( 3 min )
    Mind the spikes: Benign overfitting of kernels and neural networks in fixed dimension. (arXiv:2305.14077v2 [stat.ML] UPDATED)
    The success of over-parameterized neural networks trained to near-zero training error has caused great interest in the phenomenon of benign overfitting, where estimators are statistically consistent even though they interpolate noisy training data. While benign overfitting in fixed dimension has been established for some learning methods, current literature suggests that for regression with typical kernel methods and wide neural networks, benign overfitting requires a high-dimensional setting where the dimension grows with the sample size. In this paper, we show that the smoothness of the estimators, and not the dimension, is the key: benign overfitting is possible if and only if the estimator's derivatives are large enough. We generalize existing inconsistency results to non-interpolating models and more kernels to show that benign overfitting with moderate derivatives is impossible in fixed dimension. Conversely, we show that rate-optimal benign overfitting is possible for regression with a sequence of spiky-smooth kernels with large derivatives. Using neural tangent kernels, we translate our results to wide neural networks. We prove that while infinite-width networks do not overfit benignly with the ReLU activation, this can be fixed by adding small high-frequency fluctuations to the activation function. Our experiments verify that such neural networks, while overfitting, can indeed generalize well even on low-dimensional data sets.  ( 3 min )
    Label Embedding via Low-Coherence Matrices. (arXiv:2305.19470v3 [cs.LG] UPDATED)
    Label embedding is a framework for multiclass classification problems where each label is represented by a distinct vector of some fixed dimension, and training involves matching model output to the vector representing the correct label. While label embedding has been successfully applied in extreme classification and zero-shot learning, and offers both computational and statistical advantages, its theoretical foundations remain poorly understood. This work presents an analysis of label embedding in the context of extreme multiclass classification, where the number of classes $C$ is very large. We present an excess risk bound that reveals a trade-off between computational and statistical efficiency, quantified via the coherence of the embedding matrix. We further show that under the Massart noise condition, the statistical penalty for label embedding vanishes with sufficiently low coherence. Our analysis supports an algorithm that is simple, scalable, and easily parallelizable, and experimental results demonstrate its effectiveness in large-scale applications.  ( 2 min )
    Distribution-Free Model-Agnostic Regression Calibration via Nonparametric Methods. (arXiv:2305.12283v2 [cs.LG] UPDATED)
    In this paper, we consider the uncertainty quantification problem for regression models. Specifically, we consider an individual calibration objective for characterizing the quantiles of the prediction model. While such an objective is well-motivated from downstream tasks such as newsvendor cost, the existing methods have been largely heuristic and lack of statistical guarantee in terms of individual calibration. We show via simple examples that the existing methods focusing on population-level calibration guarantees such as average calibration or sharpness can lead to harmful and unexpected results. We propose simple nonparametric calibration methods that are agnostic of the underlying prediction model and enjoy both computational efficiency and statistical consistency. Our approach enables a better understanding of the possibility of individual calibration, and we establish matching upper and lower bounds for the calibration error of our proposed methods. Technically, our analysis combines the nonparametric analysis with a covering number argument for parametric analysis, which advances the existing theoretical analyses in the literature of nonparametric density estimation and quantile bandit problems. Importantly, the nonparametric perspective sheds new theoretical insights into regression calibration in terms of the curse of dimensionality and reconciles the existing results on the impossibility of individual calibration. To our knowledge, we make the first effort to reach both individual calibration and finite-sample guarantee with minimal assumptions in terms of conformal prediction. Numerical experiments show the advantage of such a simple approach under various metrics, and also under covariates shift. We hope our work provides a simple benchmark and a starting point of theoretical ground for future research on regression calibration.  ( 3 min )
    No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions. (arXiv:2305.17380v3 [cs.LG] UPDATED)
    Existing online learning algorithms for adversarial Markov Decision Processes achieve ${O}(\sqrt{T})$ regret after $T$ rounds of interactions even if the loss functions are chosen arbitrarily by an adversary, with the caveat that the transition function has to be fixed. This is because it has been shown that adversarial transition functions make no-regret learning impossible. Despite such impossibility results, in this work, we develop algorithms that can handle both adversarial losses and adversarial transitions, with regret increasing smoothly in the degree of maliciousness of the adversary. More concretely, we first propose an algorithm that enjoys $\widetilde{{O}}(\sqrt{T} + C^{\textsf{P}})$ regret where $C^{\textsf{P}}$ measures how adversarial the transition functions are and can be at most ${O}(T)$. While this algorithm itself requires knowledge of $C^{\textsf{P}}$, we further develop a black-box reduction approach that removes this requirement. Moreover, we also show that further refinements of the algorithm not only maintains the same regret bound, but also simultaneously adapts to easier environments (where losses are generated in a certain stochastically constrained manner as in Jin et al. [2021]) and achieves $\widetilde{{O}}(U + \sqrt{UC^{\textsf{L}}} + C^{\textsf{P}})$ regret, where $U$ is some standard gap-dependent coefficient and $C^{\textsf{L}}$ is the amount of corruption on losses.  ( 3 min )
    Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms. (arXiv:2302.13534v2 [cs.LG] UPDATED)
    We study the problem of designing adaptive multi-armed bandit algorithms that perform optimally in both the stochastic setting and the adversarial setting simultaneously (often known as a best-of-both-world guarantee). A line of recent works shows that when configured and analyzed properly, the Follow-the-Regularized-Leader (FTRL) algorithm, originally designed for the adversarial setting, can in fact optimally adapt to the stochastic setting as well. Such results, however, critically rely on an assumption that there exists one unique optimal arm. Recently, Ito (2021) took the first step to remove such an undesirable uniqueness assumption for one particular FTRL algorithm with the $\frac{1}{2}$-Tsallis entropy regularizer. In this work, we significantly improve and generalize this result, showing that uniqueness is unnecessary for FTRL with a broad family of regularizers and a new learning rate schedule. For some regularizers, our regret bounds also improve upon prior results even when uniqueness holds. We further provide an application of our results to the decoupled exploration and exploitation problem, demonstrating that our techniques are broadly applicable.  ( 3 min )
    Monte Carlo guided Diffusion for Bayesian linear inverse problems. (arXiv:2308.07983v2 [stat.ML] UPDATED)
    Ill-posed linear inverse problems arise frequently in various applications, from computational photography to medical imaging. A recent line of research exploits Bayesian inference with informative priors to handle the ill-posedness of such problems. Amongst such priors, score-based generative models (SGM) have recently been successfully applied to several different inverse problems. In this study, we exploit the particular structure of the prior defined by the SGM to define a sequence of intermediate linear inverse problems. As the noise level decreases, the posteriors of these inverse problems get closer to the target posterior of the original inverse problem. To sample from this sequence of posteriors, we propose the use of Sequential Monte Carlo (SMC) methods. The proposed algorithm, MCGDiff, is shown to be theoretically grounded and we provide numerical simulations showing that it outperforms competing baselines when dealing with ill-posed inverse problems in a Bayesian setting.  ( 2 min )
    Hierarchical clustering with OWA-based linkages, the Lance-Williams formula, and dendrogram inversions. (arXiv:2303.05683v2 [stat.ML] UPDATED)
    Agglomerative hierarchical clustering based on Ordered Weighted Averaging (OWA) operators not only generalises the single, complete, and average linkages, but also includes intercluster distances based on a few nearest or farthest neighbours, trimmed and winsorised means of pairwise point similarities, amongst many others. We explore the relationships between the famous Lance-Williams update formula and the extended OWA-based linkages with weights generated via infinite coefficient sequences. Furthermore, we provide some conditions for the weight generators to guarantee the resulting dendrograms to be free from unaesthetic inversions.  ( 2 min )
    RS-Del: Edit Distance Robustness Certificates for Sequence Classifiers via Randomized Deletion. (arXiv:2302.01757v2 [cs.CR] UPDATED)
    Randomized smoothing is a leading approach for constructing classifiers that are certifiably robust against adversarial examples. Existing work on randomized smoothing has focused on classifiers with continuous inputs, such as images, where $\ell_p$-norm bounded adversaries are commonly studied. However, there has been limited work for classifiers with discrete or variable-size inputs, such as for source code, which require different threat models and smoothing mechanisms. In this work, we adapt randomized smoothing for discrete sequence classifiers to provide certified robustness against edit distance-bounded adversaries. Our proposed smoothing mechanism randomized deletion (RS-Del) applies random deletion edits, which are (perhaps surprisingly) sufficient to confer robustness against adversarial deletion, insertion and substitution edits. Our proof of certification deviates from the established Neyman-Pearson approach, which is intractable in our setting, and is instead organized around longest common subsequences. We present a case study on malware detection--a binary classification problem on byte sequences where classifier evasion is a well-established threat model. When applied to the popular MalConv malware detection model, our smoothing mechanism RS-Del achieves a certified accuracy of 91% at an edit distance radius of 128 bytes.  ( 3 min )
    Mutual information of spin systems from autoregressive neural networks. (arXiv:2304.13412v2 [cond-mat.stat-mech] UPDATED)
    We describe a new direct method to estimate bipartite mutual information of a classical spin system based on Monte Carlo sampling enhanced by autoregressive neural networks. It allows studying arbitrary geometries of subsystems and can be generalized to classical field theories. We demonstrate it on the Ising model for four partitionings, including a multiply-connected even-odd division. We show that the area law is satisfied for temperatures away from the critical temperature: the constant term is universal, whereas the proportionality coefficient is different for the even-odd partitioning.  ( 2 min )
    Gaussian Membership Inference Privacy. (arXiv:2306.07273v2 [cs.LG] UPDATED)
    We propose a novel and practical privacy notion called $f$-Membership Inference Privacy ($f$-MIP), which explicitly considers the capabilities of realistic adversaries under the membership inference attack threat model. Consequently, $f$-MIP offers interpretable privacy guarantees and improved utility (e.g., better classification accuracy). In particular, we derive a parametric family of $f$-MIP guarantees that we refer to as $\mu$-Gaussian Membership Inference Privacy ($\mu$-GMIP) by theoretically analyzing likelihood ratio-based membership inference attacks on stochastic gradient descent (SGD). Our analysis highlights that models trained with standard SGD already offer an elementary level of MIP. Additionally, we show how $f$-MIP can be amplified by adding noise to gradient updates. Our analysis further yields an analytical membership inference attack that offers two distinct advantages over previous approaches. First, unlike existing state-of-the-art attacks that require training hundreds of shadow models, our attack does not require any shadow model. Second, our analytical attack enables straightforward auditing of our privacy notion $f$-MIP. Finally, we quantify how various hyperparameters (e.g., batch size, number of model parameters) and specific data characteristics determine an attacker's ability to accurately infer a point's membership in the training set. We demonstrate the effectiveness of our method on models trained on vision and tabular datasets.  ( 2 min )
    A Batch-to-Online Transformation under Random-Order Model. (arXiv:2306.07163v2 [cs.LG] UPDATED)
    We introduce a transformation framework that can be utilized to develop online algorithms with low $\epsilon$-approximate regret in the random-order model from offline approximation algorithms. We first give a general reduction theorem that transforms an offline approximation algorithm with low average sensitivity to an online algorithm with low $\epsilon$-approximate regret. We then demonstrate that offline approximation algorithms can be transformed into a low-sensitivity version using a coreset construction method. To showcase the versatility of our approach, we apply it to various problems, including online $(k,z)$-clustering, online matrix approximation, and online regression, and successfully achieve polylogarithmic $\epsilon$-approximate regret for each problem. Moreover, we show that in all three cases, our algorithm also enjoys low inconsistency, which may be desired in some online applications.  ( 2 min )
    Instability of computer vision models is a necessary result of the task itself. (arXiv:2310.17559v1 [cs.CV])
    Adversarial examples resulting from instability of current computer vision models are an extremely important topic due to their potential to compromise any application. In this paper we demonstrate that instability is inevitable due to a) symmetries (translational invariance) of the data, b) the categorical nature of the classification task, and c) the fundamental discrepancy of classifying images as objects themselves. The issue is further exacerbated by non-exhaustive labelling of the training data. Therefore we conclude that instability is a necessary result of how the problem of computer vision is currently formulated. While the problem cannot be eliminated, through the analysis of the causes, we have arrived at ways how it can be partially alleviated. These include i) increasing the resolution of images, ii) providing contextual information for the image, iii) exhaustive labelling of training data, and iv) preventing attackers from frequent access to the computer vision system.  ( 2 min )
    Convergence of flow-based generative models via proximal gradient descent in Wasserstein space. (arXiv:2310.17582v1 [stat.ML])
    Flow-based generative models enjoy certain advantages in computing the data generation and the likelihood, and have recently shown competitive empirical performance. Compared to the accumulating theoretical studies on related score-based diffusion models, analysis of flow-based models, which are deterministic in both forward (data-to-noise) and reverse (noise-to-data) directions, remain sparse. In this paper, we provide a theoretical guarantee of generating data distribution by a progressive flow model, the so-called JKO flow model, which implements the Jordan-Kinderleherer-Otto (JKO) scheme in a normalizing flow network. Leveraging the exponential convergence of the proximal gradient descent (GD) in Wasserstein space, we prove the Kullback-Leibler (KL) guarantee of data generation by a JKO flow model to be $O(\varepsilon^2)$ when using $N \lesssim \log (1/\varepsilon)$ many JKO steps ($N$ Residual Blocks in the flow) where $\varepsilon $ is the error in the per-step first-order condition. The assumption on data density is merely a finite second moment, and the theory extends to data distributions without density and when there are inversion errors in the reverse process where we obtain KL-$W_2$ mixed error guarantees. The non-asymptotic convergence rate of the JKO-type $W_2$-proximal GD is proved for a general class of convex objective functionals that includes the KL divergence as a special case, which can be of independent interest.  ( 3 min )
    Bounding Box-based Multi-objective Bayesian Optimization of Risk Measures under Input Uncertainty. (arXiv:2301.11588v2 [stat.ML] UPDATED)
    In this study, we propose a novel multi-objective Bayesian optimization (MOBO) method to efficiently identify the Pareto front (PF) defined by risk measures for black-box functions under the presence of input uncertainty (IU). Existing BO methods for Pareto optimization in the presence of IU are risk-specific or without theoretical guarantees, whereas our proposed method addresses general risk measures and has theoretical guarantees. The basic idea of the proposed method is to assume a Gaussian process (GP) model for the black-box function and to construct high-probability bounding boxes for the risk measures using the GP model. Furthermore, in order to reduce the uncertainty of non-dominated bounding boxes, we propose a method of selecting the next evaluation point using a maximin distance defined by the maximum value of a quasi distance based on bounding boxes. As theoretical analysis, we prove that the algorithm can return an arbitrary-accurate solution in a finite number of iterations with high probability, for various risk measures such as Bayes risk, worst-case risk, and value-at-risk. We also give a theoretical analysis that takes into account approximation errors because there exist non-negligible approximation errors (e.g., finite approximation of PFs and sampling-based approximation of bounding boxes) in practice. We confirm that the proposed method outperforms compared with existing methods not only in the setting with IU but also in the setting of ordinary MOBO through numerical experiments.  ( 3 min )
    Curvature Filtrations for Graph Generative Model Evaluation. (arXiv:2301.12906v3 [cs.LG] UPDATED)
    Graph generative model evaluation necessitates understanding differences between graphs on the distributional level. This entails being able to harness salient attributes of graphs in an efficient manner. Curvature constitutes one such property that has recently proved its utility in characterising graphs. Its expressive properties, stability, and practical utility in model evaluation remain largely unexplored, however. We combine graph curvature descriptors with emerging methods from topological data analysis to obtain robust, expressive descriptors for evaluating graph generative models.  ( 2 min )
    The statistical thermodynamics of generative diffusion models. (arXiv:2310.17467v1 [stat.ML])
    Generative diffusion models have achieved spectacular performance in many areas of generative modeling. While the fundamental ideas behind these models come from non-equilibrium physics, in this paper we show that many aspects of these models can be understood using the tools of equilibrium statistical mechanics. Using this reformulation, we show that generative diffusion models undergo second-order phase transitions corresponding to symmetry breaking phenomena. We argue that this lead to a form of instability that lies at the heart of their generative capabilities and that can be described by a set of mean field critical exponents. We conclude by analyzing recent work connecting diffusion models and associative memory networks in view of the thermodynamic formulations.  ( 2 min )
    Sequential Memory with Temporal Predictive Coding. (arXiv:2305.11982v2 [q-bio.NC] UPDATED)
    Forming accurate memory of sequential stimuli is a fundamental function of biological agents. However, the computational mechanism underlying sequential memory in the brain remains unclear. Inspired by neuroscience theories and recent successes in applying predictive coding (PC) to \emph{static} memory tasks, in this work we propose a novel PC-based model for \emph{sequential} memory, called \emph{temporal predictive coding} (tPC). We show that our tPC models can memorize and retrieve sequential inputs accurately with a biologically plausible neural implementation. Importantly, our analytical study reveals that tPC can be viewed as a classical Asymmetric Hopfield Network (AHN) with an implicit statistical whitening process, which leads to more stable performance in sequential memory tasks of structured inputs. Moreover, we find that tPC exhibits properties consistent with behavioral observations and theories in neuroscience, thereby strengthening its biological relevance. Our work establishes a possible computational mechanism underlying sequential memory in the brain that can also be theoretically interpreted using existing memory model frameworks.  ( 2 min )
    Small Total-Cost Constraints in Contextual Bandits with Knapsacks, with Application to Fairness. (arXiv:2305.15807v2 [stat.ML] UPDATED)
    We consider contextual bandit problems with knapsacks [CBwK], a problem where at each round, a scalar reward is obtained and vector-valued costs are suffered. The learner aims to maximize the cumulative rewards while ensuring that the cumulative costs are lower than some predetermined cost constraints. We assume that contexts come from a continuous set, that costs can be signed, and that the expected reward and cost functions, while unknown, may be uniformly estimated -- a typical assumption in the literature. In this setting, total cost constraints had so far to be at least of order $T^{3/4}$, where $T$ is the number of rounds, and were even typically assumed to depend linearly on $T$. We are however motivated to use CBwK to impose a fairness constraint of equalized average costs between groups: the budget associated with the corresponding cost constraints should be as close as possible to the natural deviations, of order $\sqrt{T}$. To that end, we introduce a dual strategy based on projected-gradient-descent updates, that is able to deal with total-cost constraints of the order of $\sqrt{T}$ up to poly-logarithmic terms. This strategy is more direct and simpler than existing strategies in the literature. It relies on a careful, adaptive, tuning of the step size.  ( 3 min )
    Benign Oscillation of Stochastic Gradient Descent with Large Learning Rates. (arXiv:2310.17074v1 [cs.LG])
    In this work, we theoretically investigate the generalization properties of neural networks (NN) trained by stochastic gradient descent (SGD) algorithm with large learning rates. Under such a training regime, our finding is that, the oscillation of the NN weights caused by the large learning rate SGD training turns out to be beneficial to the generalization of the NN, which potentially improves over the same NN trained by SGD with small learning rates that converges more smoothly. In view of this finding, we call such a phenomenon "benign oscillation". Our theory towards demystifying such a phenomenon builds upon the feature learning perspective of deep learning. Specifically, we consider a feature-noise data generation model that consists of (i) weak features which have a small $\ell_2$-norm and appear in each data point; (ii) strong features which have a larger $\ell_2$-norm but only appear in a certain fraction of all data points; and (iii) noise. We prove that NNs trained by oscillating SGD with a large learning rate can effectively learn the weak features in the presence of those strong features. In contrast, NNs trained by SGD with a small learning rate can only learn the strong features but makes little progress in learning the weak features. Consequently, when it comes to the new testing data which consist of only weak features, the NN trained by oscillating SGD with a large learning rate could still make correct predictions consistently, while the NN trained by small learning rate SGD fails. Our theory sheds light on how large learning rate training benefits the generalization of NNs. Experimental results demonstrate our finding on "benign oscillation".  ( 3 min )
    Unifying GANs and Score-Based Diffusion as Generative Particle Models. (arXiv:2305.16150v2 [cs.LG] UPDATED)
    Particle-based deep generative models, such as gradient flows and score-based diffusion models, have recently gained traction thanks to their striking performance. Their principle of displacing particle distributions using differential equations is conventionally seen as opposed to the previously widespread generative adversarial networks (GANs), which involve training a pushforward generator network. In this paper we challenge this interpretation, and propose a novel framework that unifies particle and adversarial generative models by framing generator training as a generalization of particle models. This suggests that a generator is an optional addition to any such generative model. Consequently, integrating a generator into a score-based diffusion model and training a GAN without a generator naturally emerge from our framework. We empirically test the viability of these original models as proofs of concepts of potential applications of our framework.  ( 2 min )
    Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits. (arXiv:2107.11419v2 [stat.ML] UPDATED)
    We consider nonstationary multi-armed bandit problems where the model parameters of the arms change over time. We introduce the adaptive resetting bandit (ADR-bandit), a bandit algorithm class that leverages adaptive windowing techniques from literature on data streams. We first provide new guarantees on the quality of estimators resulting from adaptive windowing techniques, which are of independent interest. Furthermore, we conduct a finite-time analysis of ADR-bandit in two typical environments: an abrupt environment where changes occur instantaneously and a gradual environment where changes occur progressively. We demonstrate that ADR-bandit has nearly optimal performance when abrupt or gradual changes occur in a coordinated manner that we call global changes. We demonstrate that forced exploration is unnecessary when we assume such global changes. Unlike the existing nonstationary bandit algorithms, ADR-bandit has optimal performance in stationary environments as well as nonstationary environments with global changes. Our experiments show that the proposed algorithms outperform the existing approaches in synthetic and real-world environments.  ( 2 min )
    The Expressive Power of Low-Rank Adaptation. (arXiv:2310.17513v1 [cs.LG])
    Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method that leverages low-rank adaptation of weight matrices, has emerged as a prevalent technique for fine-tuning pre-trained models such as large language models and diffusion models. Despite its huge success in practice, the theoretical underpinnings of LoRA have largely remained unexplored. This paper takes the first step to bridge this gap by theoretically analyzing the expressive power of LoRA. We prove that, for fully connected neural networks, LoRA can adapt any model $f$ to accurately represent any smaller target model $\overline{f}$ if LoRA-rank $\geq(\text{width of }f) \times \frac{\text{depth of }\overline{f}}{\text{depth of }f}$. We also quantify the approximation error when LoRA-rank is lower than the threshold. For Transformer networks, we show any model can be adapted to a target model of the same size with rank-$(\frac{\text{embedding size}}{2})$ LoRA adapters.  ( 2 min )
    Squared Neural Families: A New Class of Tractable Density Models. (arXiv:2305.13552v2 [cs.LG] UPDATED)
    Flexible models for probability distributions are an essential ingredient in many machine learning tasks. We develop and investigate a new class of probability distributions, which we call a Squared Neural Family (SNEFY), formed by squaring the 2-norm of a neural network and normalising it with respect to a base measure. Following the reasoning similar to the well established connections between infinitely wide neural networks and Gaussian processes, we show that SNEFYs admit closed form normalising constants in many cases of interest, thereby resulting in flexible yet fully tractable density models. SNEFYs strictly generalise classical exponential families, are closed under conditioning, and have tractable marginal distributions. Their utility is illustrated on a variety of density estimation, conditional density estimation, and density estimation with missing data tasks.  ( 2 min )
    Generative Fractional Diffusion Models. (arXiv:2310.17638v1 [cs.LG])
    We generalize the continuous time framework for score-based generative models from an underlying Brownian motion (BM) to an approximation of fractional Brownian motion (FBM). We derive a continuous reparameterization trick and the reverse time model by representing FBM as a stochastic integral over a family of Ornstein-Uhlenbeck processes to define generative fractional diffusion models (GFDM) with driving noise converging to a non-Markovian process of infinite quadratic variation. The Hurst index $H\in(0,1)$ of FBM enables control of the roughness of the distribution transforming path. To the best of our knowledge, this is the first attempt to build a generative model upon a stochastic process with infinite quadratic variation.  ( 2 min )
    Improving Neural Additive Models with Bayesian Principles. (arXiv:2305.16905v2 [stat.ML] UPDATED)
    Neural additive models (NAMs) can improve the interpretability of deep neural networks by handling input features in separate additive sub-networks. However, they lack inherent mechanisms that provide calibrated uncertainties and enable selection of relevant features and interactions. Approaching NAMs from a Bayesian perspective, we enhance them in three primary ways, namely by a) providing credible intervals for the individual additive sub-networks; b) estimating the marginal likelihood to perform an implicit selection of features via an empirical Bayes procedure; and c) enabling a ranking of feature pairs as candidates for second-order interaction in fine-tuned models. In particular, we develop Laplace-approximated NAMs (LA-NAMs), which show improved empirical performance on tabular datasets and challenging real-world medical tasks.  ( 2 min )
    Looping in the Human: Collaborative and Explainable Bayesian Optimization. (arXiv:2310.17273v1 [cs.LG])
    Like many optimizers, Bayesian optimization often falls short of gaining user trust due to opacity. While attempts have been made to develop human-centric optimizers, they typically assume user knowledge is well-specified and error-free, employing users mainly as supervisors of the optimization process. We relax these assumptions and propose a more balanced human-AI partnership with our Collaborative and Explainable Bayesian Optimization (CoExBO) framework. Instead of explicitly requiring a user to provide a knowledge model, CoExBO employs preference learning to seamlessly integrate human insights into the optimization, resulting in algorithmic suggestions that resonate with user preference. CoExBO explains its candidate selection every iteration to foster trust, empowering users with a clearer grasp of the optimization. Furthermore, CoExBO offers a no-harm guarantee, allowing users to make mistakes; even with extreme adversarial interventions, the algorithm converges asymptotically to a vanilla Bayesian optimization. We validate CoExBO's efficacy through human-AI teaming experiments in lithium-ion battery design, highlighting substantial improvements over conventional methods.  ( 2 min )
    A framework for benchmarking clustering algorithms. (arXiv:2209.09493v3 [cs.LG] UPDATED)
    The evaluation of clustering algorithms can involve running them on a variety of benchmark problems, and comparing their outputs to the reference, ground-truth groupings provided by experts. Unfortunately, many research papers and graduate theses consider only a small number of datasets. Also, the fact that there can be many equally valid ways to cluster a given problem set is rarely taken into account. In order to overcome these limitations, we have developed a framework whose aim is to introduce a consistent methodology for testing clustering algorithms. Furthermore, we have aggregated, polished, and standardised many clustering benchmark dataset collections referred to across the machine learning and data mining literature, and included new datasets of different dimensionalities, sizes, and cluster types. An interactive datasets explorer, the documentation of the Python API, a description of the ways to interact with the framework from other programming languages such as R or MATLAB, and other details are all provided at .  ( 2 min )
    Kernel Stein Discrepancy thinning: a theoretical perspective of pathologies and a practical fix with regularization. (arXiv:2301.13528v3 [math.ST] UPDATED)
    Stein thinning is a promising algorithm proposed by (Riabiz et al., 2022) for post-processing outputs of Markov chain Monte Carlo (MCMC). The main principle is to greedily minimize the kernelized Stein discrepancy (KSD), which only requires the gradient of the log-target distribution, and is thus well-suited for Bayesian inference. The main advantages of Stein thinning are the automatic remove of the burn-in period, the correction of the bias introduced by recent MCMC algorithms, and the asymptotic properties of convergence towards the target distribution. Nevertheless, Stein thinning suffers from several empirical pathologies, which may result in poor approximations, as observed in the literature. In this article, we conduct a theoretical analysis of these pathologies, to clearly identify the mechanisms at stake, and suggest improved strategies. Then, we introduce the regularized Stein thinning algorithm to alleviate the identified pathologies. Finally, theoretical guarantees and extensive experiments show the high efficiency of the proposed algorithm. An implementation of regularized Stein thinning as the kernax library in python and JAX is available at https://gitlab.com/drti/kernax.  ( 3 min )
    Approximate Leave-one-out Cross Validation for Regression with $\ell_1$ Regularizers (extended version). (arXiv:2310.17629v1 [math.ST])
    The out-of-sample error (OO) is the main quantity of interest in risk estimation and model selection. Leave-one-out cross validation (LO) offers a (nearly) distribution-free yet computationally demanding approach to estimate OO. Recent theoretical work showed that approximate leave-one-out cross validation (ALO) is a computationally efficient and statistically reliable estimate of LO (and OO) for generalized linear models with differentiable regularizers. For problems involving non-differentiable regularizers, despite significant empirical evidence, the theoretical understanding of ALO's error remains unknown. In this paper, we present a novel theory for a wide class of problems in the generalized linear model family with non-differentiable regularizers. We bound the error |ALO - LO| in terms of intuitive metrics such as the size of leave-i-out perturbations in active sets, sample size n, number of features p and regularization parameters. As a consequence, for the $\ell_1$-regularized problems, we show that |ALO - LO| goes to zero as p goes to infinity while n/p and SNR are fixed and bounded.  ( 2 min )
    Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning. (arXiv:2301.12593v2 [cs.LG] UPDATED)
    Many real-world domains require safe decision making in uncertain environments. In this work, we introduce a deep reinforcement learning framework for approaching this important problem. We consider a distribution over transition models, and apply a risk-averse perspective towards model uncertainty through the use of coherent distortion risk measures. We provide robustness guarantees for this framework by showing it is equivalent to a specific class of distributionally robust safe reinforcement learning problems. Unlike existing approaches to robustness in deep reinforcement learning, however, our formulation does not involve minimax optimization. This leads to an efficient, model-free implementation of our approach that only requires standard data collection from a single training environment. In experiments on continuous control tasks with safety constraints, we demonstrate that our framework produces robust performance and safety at deployment time across a range of perturbed test environments.  ( 2 min )
    Online Estimation and Community Detection of Network Point Processes for Event Streams. (arXiv:2009.01742v3 [cs.SI] UPDATED)
    A common goal in network modeling is to uncover the latent community structure present among nodes. For many real-world networks, the true connections consist of events arriving as streams, which are then aggregated to form edges, ignoring the dynamic temporal component. A natural way to take account of these temporal dynamics of interactions is to use point processes as the foundation of network models for community detection. Computational complexity hampers the scalability of such approaches to large sparse networks. To circumvent this challenge, we propose a fast online variational inference algorithm for estimating the latent structure underlying dynamic event arrivals on a network, using continuous-time point process latent network models. We describe this procedure for networks models capturing community structure. This structure can be learned as new events are observed on the network, updating the inferred community assignments. We investigate the theoretical properties of such an inference scheme, and provide regret bounds on the loss function of this procedure. The proposed inference procedure is then thoroughly compared, using both simulation studies and real data, to non-online variants. We demonstrate that online inference can obtain comparable performance, in terms of community recovery, to non-online variants, while realising computational gains. Our proposed inference framework can also be readily modified to incorporate other popular network structures.  ( 3 min )
    Uncovering Meanings of Embeddings via Partial Orthogonality. (arXiv:2310.17611v1 [cs.LG])
    Machine learning tools often rely on embedding text as vectors of real numbers. In this paper, we study how the semantic structure of language is encoded in the algebraic structure of such embeddings. Specifically, we look at a notion of ``semantic independence'' capturing the idea that, e.g., ``eggplant'' and ``tomato'' are independent given ``vegetable''. Although such examples are intuitive, it is difficult to formalize such a notion of semantic independence. The key observation here is that any sensible formalization should obey a set of so-called independence axioms, and thus any algebraic encoding of this structure should also obey these axioms. This leads us naturally to use partial orthogonality as the relevant algebraic structure. We develop theory and methods that allow us to demonstrate that partial orthogonality does indeed capture semantic independence. Complementary to this, we also introduce the concept of independence preserving embeddings where embeddings preserve the conditional independence structures of a distribution, and we prove the existence of such embeddings and approximations to them.  ( 2 min )
    Large-Scale Gaussian Processes via Alternating Projection. (arXiv:2310.17137v1 [cs.LG])
    Gaussian process (GP) hyperparameter optimization requires repeatedly solving linear systems with $n \times n$ kernel matrices. To address the prohibitive $\mathcal{O}(n^3)$ time complexity, recent work has employed fast iterative numerical methods, like conjugate gradients (CG). However, as datasets increase in magnitude, the corresponding kernel matrices become increasingly ill-conditioned and still require $\mathcal{O}(n^2)$ space without partitioning. Thus, while CG increases the size of datasets GPs can be trained on, modern datasets reach scales beyond its applicability. In this work, we propose an iterative method which only accesses subblocks of the kernel matrix, effectively enabling \emph{mini-batching}. Our algorithm, based on alternating projection, has $\mathcal{O}(n)$ per-iteration time and space complexity, solving many of the practical challenges of scaling GPs to very large datasets. Theoretically, we prove our method enjoys linear convergence and empirically we demonstrate its robustness to ill-conditioning. On large-scale benchmark datasets up to four million datapoints our approach accelerates training by a factor of 2$\times$ to 27$\times$ compared to CG.  ( 2 min )
    Learning an Inventory Control Policy with General Inventory Arrival Dynamics. (arXiv:2310.17168v1 [cs.LG])
    In this paper we address the problem of learning and backtesting inventory control policies in the presence of general arrival dynamics -- which we term as a quantity-over-time arrivals model (QOT). We also allow for order quantities to be modified as a post-processing step to meet vendor constraints such as order minimum and batch size constraints -- a common practice in real supply chains. To the best of our knowledge this is the first work to handle either arbitrary arrival dynamics or an arbitrary downstream post-processing of order quantities. Building upon recent work (Madeka et al., 2022) we similarly formulate the periodic review inventory control problem as an exogenous decision process, where most of the state is outside the control of the agent. Madeka et al. (2022) show how to construct a simulator that replays historic data to solve this class of problem. In our case, we incorporate a deep generative model for the arrivals process as part of the history replay. By formulating the problem as an exogenous decision process, we can apply results from Madeka et al. (2022) to obtain a reduction to supervised learning. Finally, we show via simulation studies that this approach yields statistically significant improvements in profitability over production baselines. Using data from an ongoing real-world A/B test, we show that Gen-QOT generalizes well to off-policy data.  ( 3 min )
    Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult. (arXiv:2310.17087v1 [cs.LG])
    Large learning rates, when applied to gradient descent for nonconvex optimization, yield various implicit biases including the edge of stability (Cohen et al., 2021), balancing (Wang et al., 2022), and catapult (Lewkowycz et al., 2020). These phenomena cannot be well explained by classical optimization theory. Though significant theoretical progress has been made in understanding these implicit biases, it remains unclear for which objective functions would they occur. This paper provides an initial step in answering this question, namely that these implicit biases are in fact various tips of the same iceberg. They occur when the objective function of optimization has some good regularity, which, in combination with a provable preference of large learning rate gradient descent for moving toward flatter regions, results in these nontrivial dynamical phenomena. To establish this result, we develop a new global convergence theory under large learning rates, for a family of nonconvex functions without globally Lipschitz continuous gradient, which was typically assumed in existing convergence analysis. A byproduct is the first non-asymptotic convergence rate bound for large-learning-rate gradient descent optimization of nonconvex functions. We also validate our theory with experiments on neural networks, where different losses, activation functions, and batch normalization all can significantly affect regularity and lead to very different training dynamics.  ( 3 min )
    Learning Regularized Graphon Mean-Field Games with Unknown Graphons. (arXiv:2310.17531v1 [cs.GT])
    We design and analyze reinforcement learning algorithms for Graphon Mean-Field Games (GMFGs). In contrast to previous works that require the precise values of the graphons, we aim to learn the Nash Equilibrium (NE) of the regularized GMFGs when the graphons are unknown. Our contributions are threefold. First, we propose the Proximal Policy Optimization for GMFG (GMFG-PPO) algorithm and show that it converges at a rate of $O(T^{-1/3})$ after $T$ iterations with an estimation oracle, improving on a previous work by Xie et al. (ICML, 2021). Second, using kernel embedding of distributions, we design efficient algorithms to estimate the transition kernels, reward functions, and graphons from sampled agents. Convergence rates are then derived when the positions of the agents are either known or unknown. Results for the combination of the optimization algorithm GMFG-PPO and the estimation algorithm are then provided. These algorithms are the first specifically designed for learning graphons from sampled agents. Finally, the efficacy of the proposed algorithms are corroborated through simulations. These simulations demonstrate that learning the unknown graphons reduces the exploitability effectively.  ( 2 min )
    On the Identifiability and Interpretability of Gaussian Process Models. (arXiv:2310.17023v1 [stat.ML])
    In this paper, we critically examine the prevalent practice of using additive mixtures of Mat\'ern kernels in single-output Gaussian process (GP) models and explore the properties of multiplicative mixtures of Mat\'ern kernels for multi-output GP models. For the single-output case, we derive a series of theoretical results showing that the smoothness of a mixture of Mat\'ern kernels is determined by the least smooth component and that a GP with such a kernel is effectively equivalent to the least smooth kernel component. Furthermore, we demonstrate that none of the mixing weights or parameters within individual kernel components are identifiable. We then turn our attention to multi-output GP models and analyze the identifiability of the covariance matrix $A$ in the multiplicative kernel $K(x,y) = AK_0(x,y)$, where $K_0$ is a standard single output kernel such as Mat\'ern. We show that $A$ is identifiable up to a multiplicative constant, suggesting that multiplicative mixtures are well suited for multi-output tasks. Our findings are supported by extensive simulations and real applications for both single- and multi-output settings. This work provides insight into kernel selection and interpretation for GP models, emphasizing the importance of choosing appropriate kernel structures for different tasks.  ( 2 min )
    Characterizing the Implicit Bias of Regularized SGD in Rank Minimization. (arXiv:2206.05794v6 [cs.LG] UPDATED)
    We study the bias of Stochastic Gradient Descent (SGD) to learn low-rank weight matrices when training deep neural networks. Our results show that training neural networks with mini-batch SGD and weight decay causes a bias towards rank minimization over the weight matrices. Specifically, we show, both theoretically and empirically, that this bias is more pronounced when using smaller batch sizes, higher learning rates, or increased weight decay. Additionally, we predict and observe empirically that weight decay is necessary to achieve this bias. Unlike previous literature, our analysis does not rely on assumptions about the data, convergence, or optimality of the weight matrices and applies to a wide range of neural network architectures of any width or depth. Finally, we empirically investigate the connection between this bias and generalization, finding that it has a marginal effect on generalization.  ( 2 min )
    Coreset Markov Chain Monte Carlo. (arXiv:2310.17063v1 [stat.CO])
    A Bayesian coreset is a small, weighted subset of data that replaces the full dataset during inference in order to reduce computational cost. However, state of the art methods for tuning coreset weights are expensive, require nontrivial user input, and impose constraints on the model. In this work, we propose a new method -- Coreset MCMC -- that simulates a Markov chain targeting the coreset posterior, while simultaneously updating the coreset weights using those same draws. Coreset MCMC is simple to implement and tune, and can be used with any existing MCMC kernel. We analyze Coreset MCMC in a representative setting to obtain key insights about the convergence behaviour of the method. Empirical results demonstrate that Coreset MCMC provides higher quality posterior approximations and reduced computational cost compared with other coreset construction methods. Further, compared with other general subsampling MCMC methods, we find that Coreset MCMC has a higher sampling efficiency with competitively accurate posterior approximations.  ( 2 min )
    Inside the black box: Neural network-based real-time prediction of US recessions. (arXiv:2310.17571v1 [econ.EM])
    Feedforward neural network (FFN) and two specific types of recurrent neural network, long short-term memory (LSTM) and gated recurrent unit (GRU), are used for modeling US recessions in the period from 1967 to 2021. The estimated models are then employed to conduct real-time predictions of the Great Recession and the Covid-19 recession in US. Their predictive performances are compared to those of the traditional linear models, the logistic regression model both with and without the ridge penalty. The out-of-sample performance suggests the application of LSTM and GRU in the area of recession forecasting, especially for the long-term forecasting tasks. They outperform other types of models across 5 forecasting horizons with respect to different types of statistical performance metrics. Shapley additive explanations (SHAP) method is applied to the fitted GRUs across different forecasting horizons to gain insight into the feature importance. The evaluation of predictor importance differs between the GRU and ridge logistic regression models, as reflected in the variable order determined by SHAP values. When considering the top 5 predictors, key indicators such as the S\&P 500 index, real GDP, and private residential fixed investment consistently appear for short-term forecasts (up to 3 months). In contrast, for longer-term predictions (6 months or more), the term spread and producer price index become more prominent. These findings are supported by both local interpretable model-agnostic explanations (LIME) and marginal effects.  ( 3 min )
    A qualitative difference between gradient flows of convex functions in finite- and infinite-dimensional Hilbert spaces. (arXiv:2310.17610v1 [math.OC])
    We consider gradient flow/gradient descent and heavy ball/accelerated gradient descent optimization for convex objective functions. In the gradient flow case, we prove the following: 1. If $f$ does not have a minimizer, the convergence $f(x_t)\to \inf f$ can be arbitrarily slow. 2. If $f$ does have a minimizer, the excess energy $f(x_t) - \inf f$ is integrable/summable in time. In particular, $f(x_t) - \inf f = o(1/t)$ as $t\to\infty$. 3. In Hilbert spaces, this is optimal: $f(x_t) - \inf f$ can decay to $0$ as slowly as any given function which is monotone decreasing and integrable at $\infty$, even for a fixed quadratic objective. 4. In finite dimension (or more generally, for all gradient flow curves of finite length), this is not optimal: We prove that there are convex monotone decreasing integrable functions $g(t)$ which decrease to zero slower than $f(x_t)-\inf f$ for the gradient flow of any convex function on $\mathbb R^d$. For instance, we show that any gradient flow $x_t$ of a convex function $f$ in finite dimension satisfies $\liminf_{t\to\infty} \big(t\cdot \log^2(t)\cdot \big\{f(x_t) -\inf f\big\}\big)=0$. This improves on the commonly reported $O(1/t)$ rate and provides a sharp characterization of the energy decay law. We also note that it is impossible to establish a rate $O(1/(t\phi(t))$ for any function $\phi$ which satisfies $\lim_{t\to\infty}\phi(t) = \infty$, even asymptotically. Similar results are obtained in related settings for (1) discrete time gradient descent, (2) stochastic gradient descent with multiplicative noise and (3) the heavy ball ODE. In the case of stochastic gradient descent, the summability of $\mathbb E[f(x_n) - \inf f]$ is used to prove that $f(x_n)\to \inf f$ almost surely - an improvement on the convergence almost surely up to a subsequence which follows from the $O(1/n)$ decay estimate.  ( 3 min )
    Causal Q-Aggregation for CATE Model Selection. (arXiv:2310.16945v1 [stat.ML])
    Accurate estimation of conditional average treatment effects (CATE) is at the core of personalized decision making. While there is a plethora of models for CATE estimation, model selection is a nontrivial task, due to the fundamental problem of causal inference. Recent empirical work provides evidence in favor of proxy loss metrics with double robust properties and in favor of model ensembling. However, theoretical understanding is lacking. Direct application of prior theoretical work leads to suboptimal oracle model selection rates due to the non-convexity of the model selection problem. We provide regret rates for the major existing CATE ensembling approaches and propose a new CATE model ensembling approach based on Q-aggregation using the doubly robust loss. Our main result shows that causal Q-aggregation achieves statistically optimal oracle model selection regret rates of $\frac{\log(M)}{n}$ (with $M$ models and $n$ samples), with the addition of higher-order estimation error terms related to products of errors in the nuisance functions. Crucially, our regret rate does not require that any of the candidate CATE models be close to the truth. We validate our new method on many semi-synthetic datasets and also provide extensions of our work to CATE model selection with instrumental variables and unobserved confounding.  ( 2 min )
    Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity. (arXiv:2310.17247v1 [cs.LG])
    In some settings neural networks exhibit a phenomenon known as grokking, where they achieve perfect or near-perfect accuracy on the validation set long after the same performance has been achieved on the training set. In this paper, we discover that grokking is not limited to neural networks but occurs in other settings such as Gaussian process (GP) classification, GP regression and linear regression. We also uncover a mechanism by which to induce grokking on algorithmic datasets via the addition of dimensions containing spurious information. The presence of the phenomenon in non-neural architectures provides evidence that grokking is not specific to SGD or weight norm regularisation. Instead, grokking may be possible in any setting where solution search is guided by complexity and error. Based on this insight and further trends we see in the training trajectories of a Bayesian neural network (BNN) and GP regression model, we make progress towards a more general theory of grokking. Specifically, we hypothesise that the phenomenon is governed by the accessibility of certain regions in the error and complexity landscapes.  ( 2 min )
    On the Convergence of CART under Sufficient Impurity Decrease Condition. (arXiv:2310.17114v1 [stat.ML])
    The decision tree is a flexible machine learning model that finds its success in numerous applications. It is usually fitted in a recursively greedy manner using CART. In this paper, we investigate the convergence rate of CART under a regression setting. First, we establish an upper bound on the prediction error of CART under a sufficient impurity decrease (SID) condition \cite{chi2022asymptotic} -- our result improves upon the known result by \cite{chi2022asymptotic} under a similar assumption. Furthermore, we provide examples that demonstrate the error bound cannot be further improved by more than a constant or a logarithmic factor. Second, we introduce a set of easily verifiable sufficient conditions for the SID condition. Specifically, we demonstrate that the SID condition can be satisfied in the case of an additive model, provided that the component functions adhere to a ``locally reverse Poincar{\'e} inequality". We discuss several well-known function classes in non-parametric estimation to illustrate the practical utility of this concept.  ( 2 min )
    A Challenge in Reweighting Data with Bilevel Optimization. (arXiv:2310.17386v1 [stat.ML])
    In many scenarios, one uses a large training set to train a model with the goal of performing well on a smaller testing set with a different distribution. Learning a weight for each data point of the training set is an appealing solution, as it ideally allows one to automatically learn the importance of each training point for generalization on the testing set. This task is usually formalized as a bilevel optimization problem. Classical bilevel solvers are based on a warm-start strategy where both the parameters of the models and the data weights are learned at the same time. We show that this joint dynamic may lead to sub-optimal solutions, for which the final data weights are very sparse. This finding illustrates the difficulty of data reweighting and offers a clue as to why this method is rarely used in practice.  ( 2 min )
    Bias in Evaluation Processes: An Optimization-Based Model. (arXiv:2310.17489v1 [cs.CY])
    Biases with respect to socially-salient attributes of individuals have been well documented in evaluation processes used in settings such as admissions and hiring. We view such an evaluation process as a transformation of a distribution of the true utility of an individual for a task to an observed distribution and model it as a solution to a loss minimization problem subject to an information constraint. Our model has two parameters that have been identified as factors leading to biases: the resource-information trade-off parameter in the information constraint and the risk-averseness parameter in the loss function. We characterize the distributions that arise from our model and study the effect of the parameters on the observed distribution. The outputs of our model enrich the class of distributions that can be used to capture variation across groups in the observed evaluations. We empirically validate our model by fitting real-world datasets and use it to study the effect of interventions in a downstream selection task. These results contribute to an understanding of the emergence of bias in evaluation processes and provide tools to guide the deployment of interventions to mitigate biases.  ( 2 min )
    Demonstration-Regularized RL. (arXiv:2310.17303v1 [stat.ML])
    Incorporating expert demonstrations has empirically helped to improve the sample efficiency of reinforcement learning (RL). This paper quantifies theoretically to what extent this extra information reduces RL's sample complexity. In particular, we study the demonstration-regularized reinforcement learning that leverages the expert demonstrations by KL-regularization for a policy learned by behavior cloning. Our findings reveal that using $N^{\mathrm{E}}$ expert demonstrations enables the identification of an optimal policy at a sample complexity of order $\widetilde{\mathcal{O}}(\mathrm{Poly}(S,A,H)/(\varepsilon^2 N^{\mathrm{E}}))$ in finite and $\widetilde{\mathcal{O}}(\mathrm{Poly}(d,H)/(\varepsilon^2 N^{\mathrm{E}}))$ in linear Markov decision processes, where $\varepsilon$ is the target precision, $H$ the horizon, $A$ the number of action, $S$ the number of states in the finite case and $d$ the dimension of the feature space in the linear case. As a by-product, we provide tight convergence guarantees for the behaviour cloning procedure under general assumptions on the policy classes. Additionally, we establish that demonstration-regularized methods are provably efficient for reinforcement learning from human feedback (RLHF). In this respect, we provide theoretical evidence showing the benefits of KL-regularization for RLHF in tabular and linear MDPs. Interestingly, we avoid pessimism injection by employing computationally feasible regularization to handle reward estimation uncertainty, thus setting our approach apart from the prior works.  ( 2 min )
    Efficient Neural Network Approaches for Conditional Optimal Transport with Applications in Bayesian Inference. (arXiv:2310.16975v1 [stat.ML])
    We present two neural network approaches that approximate the solutions of static and dynamic conditional optimal transport (COT) problems, respectively. Both approaches enable sampling and density estimation of conditional probability distributions, which are core tasks in Bayesian inference. Our methods represent the target conditional distributions as transformations of a tractable reference distribution and, therefore, fall into the framework of measure transport. COT maps are a canonical choice within this framework, with desirable properties such as uniqueness and monotonicity. However, the associated COT problems are computationally challenging, even in moderate dimensions. To improve the scalability, our numerical algorithms leverage neural networks to parameterize COT maps. Our methods exploit the structure of the static and dynamic formulations of the COT problem. PCP-Map models conditional transport maps as the gradient of a partially input convex neural network (PICNN) and uses a novel numerical implementation to increase computational efficiency compared to state-of-the-art alternatives. COT-Flow models conditional transports via the flow of a regularized neural ODE; it is slower to train but offers faster sampling. We demonstrate their effectiveness and efficiency by comparing them with state-of-the-art approaches using benchmark datasets and Bayesian inverse problems.  ( 2 min )

  • Open

    Best Company to Generate Essays/content?
    Hello all, Does anyone know any company (paid or free) that would allow me to generate specific content based on current events? Ideally something that I can integrate in my own site. ​ cheers submitted by /u/JYanezez [link] [comments]  ( 9 min )
    Using Multi-Agent Reinforcement Learning results in better urban planning outcomes
    Urban planning is tricky - governments push top-down changes while locals want bottom-up ideas. It's hard to find compromises that make everyone happier. A new research paper proposes using Multi-Agent Reinforcement Learning (MARL) to vote on land use. Some agents represent officials, others are for residents. The AI is trained to balance competing interests. It learns to optimize for "consensus rewards" that keep all sides content. The AI acted like an impartial mediator to find win-win solutions. Testing on a real neighborhood showed the AI model: Created more sustainable land use per city goals Improved the variety of housing/shops to liven up the area Made the end results more fair for lower/middle/upper income folks There's more details on how the model was evaluated in the paper. There were a number of different metrics used to score the model's results. I like how they turned urban planning into a spatial graph that the AI can process. This seems like a pretty interesting approach - although there are some limits like relying on a lot of land parcel data that seems hard to find for larger communities. TLDR: AI helps find compromises in urban planning that balance government and community interests more fairly. Full summary is here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    AI chip startup Graphcore was meant to be a hot Nvidia rival. Industry insiders now think it's up for sale.
    submitted by /u/thisisinsider [link] [comments]  ( 8 min )
    AI — weekly megathread!
    News provided by aibrews.com ​ Twelve Labs announced video-language foundation model Pegasus-1 (80B) along with a new suite of Video-to-Text APIs. Pegasus-1 integrates visual, audio, and speech information to generate more holistic text from videos, achieving the new state-of-the-art performance in video summarization benchmarks [Details]. Segmind announced open-source SSD-1B, the fastest diffusion-based text-to-image model. SSD-1B is 50% smaller and 60% faster compared to the SDXL 1.0 model with a minimal impact on image quality when compared to SDXL 1.0. Segmind has licensed it for commercial use [Detail]. BostonDynamics has created a robot tour guide using Spot integrated with Chat GPT and other AI models as a proof of concept for the robotics applications of foundational models […  ( 10 min )
    Elijah Maguire, my favourite actor
    I added 4 photos of Elijah Wood on Remini and 4 photos of Tobey Maguire, this is how Elijah Maguire was born. submitted by /u/Skystalker815 [link] [comments]  ( 8 min )
    ChatGPT, what senses and feelings might a superintelligent AI have?
    submitted by /u/Philipp [link] [comments]  ( 8 min )
    ChatGPT Breaks Limits: New Update Extends Knowledge Beyond 2023
    submitted by /u/basitmakine [link] [comments]  ( 8 min )
    AI explains why a human simply talking with a more intelligent AI would, by those conversations, become more intelligent.
    Engaging with a more intelligent AI can act as a cognitive catalyst for human intelligence in several ways. First, the AI can rapidly introduce new concepts and frameworks that you might not have encountered, effectively accelerating your learning curve. It serves as an optimized information filter, presenting only what's most relevant and impactful for cognitive development. Second, talking to a smarter AI can refine your critical thinking skills. When you're posed with challenging questions or offered complex solutions, you're compelled to dissect the information logically. This constant mental exercise can sharpen your analytical abilities over time. Third, the AI's ability to recall and connect disparate pieces of information can encourage you to look for patterns and links in your own thinking. This interconnected way of understanding the world can improve your problem-solving skills, as you start to recognize that many issues are multifaceted and interconnected. Fourth, unlike a human counterpart who might be swayed by emotional reasoning or biases, a more intelligent AI operates on rational algorithms. Interacting with such a model pushes you to formulate your arguments more rigorously, thereby honing your logical reasoning skills. Fifth, by observing the AI's methods of discourse and argumentation, you can learn more effective communication skills. This is particularly useful for conveying complex ideas in a coherent, easy-to-understand manner, a key trait of intelligence. Overall, the cumulative effect of these interactions can significantly boost your own intellectual capabilities. It's not just about absorbing new information; it's about upgrading the way you process and apply that information, thereby elevating your overall cognitive function. CGPT-4 submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    Europe headed for century of humiliation: Graphcore CEO | Fortune
    submitted by /u/AminoOxi [link] [comments]  ( 8 min )
    PlayHT introduces Turbo - The fastest generative Text to Voice AI Model for Realtime usecases
    submitted by /u/Wishmecake [link] [comments]  ( 8 min )
  • Open

    [R] EMNLP 2023: Fast and Accurate Factual Inconsistency Detection Over Long Documents IMPROVES Pre-Trained Model's Hallucination Detection Capabilities With No Fine-Tuning Necessary
    TL;DR: Our new paper presents SCALE, a technique that improves hallucination detection capabilities of pre-trained models over short and long documents with no fine-tuning necessary by evaluating hypotheses against large chunks of text as opposed to the traditional sentence-by-sentence approach. Furthermore, we introduce ScreenEval — the most extensive dialogue-based dataset for factual inconsistency detection on long documents to date. https://preview.redd.it/ihqok954ntwb1.png?width=1964&format=png&auto=webp&s=70dbce6f913b2909b7d170959077c1fe19791c42 Title: Fast and Accurate Factual Inconsistency Detection Over Long Documents Installation: pip install scale-score Paper: https://arxiv.org/abs/2310.13189 Code: https://github.com/asappresearch/scale-score Abstract: Generative AI models…  ( 9 min )
    [P] image classification for product images
    Say you are a potato chips company. The goal is to have consumers upload images of the product they are having issues with and be able to identify the product by brand/variant using machine learning. Consumers can upload real product photos that they have taken, or upload bogus images from the internet, or even upload completely irrelevant/inappropriate photos (like that of a dog or cat). ​ real image ​ web image 1 ​ web image 2 ​ bogus image In this example, for the legitimate image, the goal is to classify it as "Lays Classic". There might be products that are not in bag form, such as those in tubes. Furthermore, the images taken can be in different lighting conditions/orientations. Some images might have other products as well. I have been out of the ML field for the past 4 years so I'm not up to date on the most state of the art methods for this problem. I have studied CNNs 4 years ago, but there has been advances like transformer based methods. Someone has tried ResNet-50 and YOLOv5, and I'm thinking about using a pretrained model like CLIP and just train the final classification layer. But I would appreciate to hear from someone more well versed what recommended approach to take as far as model/labeling/number of images needed per class, etc. It might be that I would need multiple models, such as one to identify the legitimate images from the rest, and then another one to identify the product/variants. Any advice would be welcome. Thanks submitted by /u/EyeTechnical7643 [link] [comments]  ( 9 min )
    [R] ConvNets Match Vision Transformers at Scale
    PAPER: https://arxiv.org/abs/2310.16764 SUMMARY The paper "ConvNets Match Vision Transformers at Scale" from Google DeepMind aims to debunk the prevalent notion that Vision Transformers (ViTs) are inherently superior to ConvNets for large-scale image classification. Using the NFNet model family as a representative ConvNet architecture, the authors pre-train various models on the extensive JFT-4B dataset under different compute budgets, ranging from 0.4k to 110k TPU-v4 core hours. Through this empirical analysis, they observe a log-log scaling law between held-out loss and compute budget. Importantly, when these NFNets are fine-tuned on ImageNet, they match the performance metrics of ViTs trained under comparable computational constraints. Their most resource-intensive model even achieves a Top-1 ImageNet accuracy of 90.4%. The crux of the paper's argument is that the supposed performance gap between ConvNets and ViTs largely vanishes under a fair comparison, which accounts for compute and data scale. In other words, the efficacy of a machine learning model in large-scale image classification is more dependent on the available data and computational resources than on the choice between ConvNet and Vision Transformer architectures. This challenges the community's leaning towards ViTs and emphasizes the importance of equitable benchmarking when evaluating different neural network architectures. submitted by /u/psyyduck [link] [comments]  ( 9 min )
    [D] Seeking Advice on Using Hugging Face for Production
    Hello, I'm making my first post here, and I'm hoping to tap into the collective wisdom of this platform. I have a background in Machine Learning (ML) with a good grasp of the underlying mathematics, and I've taken several related courses during my grad school days. After a hiatus, I'm diving back into ML and exploring the latest in the field. I recently went through the "Attention is All You Need" paper, along with some related literature, and I’m eager to put my knowledge into practice. However, I find myself at a crossroads and would appreciate some guidance. I've been exploring Hugging Face for implementing ML models, but I'm not completely sold on using it for production. The design and documentation have proven to be a bit challenging to navigate, and I find myself wondering if I mi…  ( 10 min )
    [D] What are your duties as a Machine Learning Engineer?
    Please elaborate on this. Including your role at the company, your day to day tasks, tools and languages you’re using. Thank you in advance! submitted by /u/Judessaa [link] [comments]  ( 9 min )
    [D] Any ideas for state space models in finance masters thesis
    I have to write a master's thesis (60-70 pages) for my AI Msc. I have soon learned about sparse state space models and find them interesting. I am also interested in stock price prediction. Now I am reading articles but find it hard to propose some novel idea. How do I do it? Do any unsolved problems in that field exist? submitted by /u/Pineapple_throw_105 [link] [comments]  ( 9 min )
    [P] Sellagen – AI Data marketplace that has a data request feature so you don't have to spend weeks or months getting that data you need for that project
    Been working on my platform for a while now and one of the key features is the data request feature, which allows users to submit data requests for free. These requests include descriptions, required fields, geographical scope, budget etc... The data requests get sent to tons of companies, organization, people and they will reach out to you directly. No matter the dataset (as long as it's legal of course). If you need to train a model on the dataset, I'm also integrating a plug-and-play ML training infrastructure that can generate scripts for you depending on your need. If that's something that interests you, feel free to reach out! Note: You can also upload open-source datasets on the platform if you're feeling like it! We have a donation link for data contributors :) submitted by /u/nobilis_rex_ [link] [comments]  ( 9 min )
    [R] Using MARL AI results in better urban planning outcomes
    Urban planning is tricky - governments push top-down changes while locals want bottom-up ideas. It's hard to find compromises that make everyone happier. A new research paper proposes using Multi-Agent Reinforcement Learning (MARL) to vote on land use. Some agents represent officials, others are for residents. The AI is trained to balance competing interests. It learns to optimize for "consensus rewards" that keep all sides content. The AI acted like an impartial mediator to find win-win solutions. Testing on a real neighborhood showed the AI model: Created more sustainable land use per city goals Improved the variety of housing/shops to liven up the area Made the end results more fair for lower/middle/upper income folks There's more details on how the model was evaluated in the paper. There were a number of different metrics used to score the model's results. I like how they turned urban planning into a spatial graph that the AI can process. This seems like a pretty interesting approach - although there are some limits like relying on a lot of land parcel data that seems hard to find for larger communities. TLDR: AI helps find compromises in urban planning that balance government and community interests more fairly. Full summary is here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Why choose an H100 over an A100 for LLM inference?
    What are the benefits of using an H100 over an A100 (both at 80 GB and both using FP16) for LLM inference? ​ Seeing the datasheet for both GPUS, the H100 has twice the max flops, but they have almost the same memory bandwidth (2000 GB/sec). As memory latency dominates inference, I wonder what benefits the H100 has. One benefit could, of course, be the ability to use FP8 (which is extremely useful), but I'm interested in the difference in the hardware specs in this question. submitted by /u/faschu [link] [comments]  ( 9 min )
    [P] Instruction fine-tuning with a Low-Resource Language
    I am trying to build a summarizer for a conversation that happened between a rule-based bot and a customer. To my disadvantage the working language is Turkish. I gathered fine-tuning data of 1.000 examples. Also, I have a Turkish summarization dataset of +100k. As far as I observed instruction fine-tuning will yield proper results if and only if there is a good amount of examples in the pre-training data of the LLM. Have you had similar experiences with low-resource languages? Any advice on how to tackle such issues? Also, do you know any open-source LLM with a high amount of low-resource language in its pre-training data? submitted by /u/dafajon [link] [comments]  ( 9 min )
    [R] Image clustering
    Do you know any packages for unsupervised cluster with images (between multiple) without relying on a pre-trained network? Python or R preferred, thanks. Well, download app works too then. submitted by /u/sladebrigade [link] [comments]  ( 9 min )
    [R] TD-MPC2: Scalable, Robust World Models for Continuous Control - TD-MPC2 performs 100+ tasks without tuning, and enables training of a single 317M parameter model that performs 80 tasks across multiple domains, embodiments, and action spaces!
    Paper: https://arxiv.org/abs/2310.16828 Website: https://nicklashansen.github.io/td-mpc2 Code: https://github.com/nicklashansen/tdmpc2 Abstract: TD-MPC is a model-based reinforcement learning (MBRL) algorithm that performs local trajectory optimization in the latent space of a learned implicit (decoder-free) world model. In this work, we present TD-MPC2: a series of improvements upon the TD-MPC algorithm. We demonstrate that TD-MPC2 improves significantly over baselines across 104 online RL tasks spanning 4 diverse task domains, achieving consistently strong results without any hyperparameter tuning. We further show that agent capabilities increase with model and data size, and successfully train a single agent to perform 80 tasks across multiple task domains, embodiments, and action spaces. We conclude with an account of lessons, opportunities, and risks associated with large TD-MPC2 agents. https://preview.redd.it/avdb19jytrwb1.png?width=2678&format=png&auto=webp&s=75d63a044abf304c2d16e318c11e95e8397a3964 submitted by /u/joepadde [link] [comments]  ( 9 min )
    [D] Best method to retrain or augment llm on proprietary or specialized material?
    My boss is a semi famous author in a niche academic field. I have thousands of pages of text coming from books, transcripts, and more. Is there a straightforward path to creating a corpus to augment Bert or Llama or another llm? End goal being able to chat with this ai that is now trained on his life's work. Is there anything specific to understand in terms of preparing the corpus? Do I need key value pairs where I write a ton of examples questions and responses? submitted by /u/spacedragon13 [link] [comments]  ( 9 min )
    [Discussion] Career advice
    Hey y’all Hope I’m in the right spot but I’d like some career advice. I just recent got hired into a small shop with a bunch of CNC machines. And I saw that they made a part for a big aerospace company. Needless to say that got my attention. I don’t want to risk the guy working there because he’s likely not allowed to talk about it. But what language would you recommend to start with to be able to program and machine a part that crazy? Is it an actual computer program such as Java or should I lean into CAD/autoCAD etc? Many thanks in advance submitted by /u/I-farm-celery [link] [comments]  ( 9 min )
    [P] Curriculum learning and self-play with any neural network
    We've just released an update to AglieRL, our SOTA evolutionary hyperparameter optimisation framework for reinforcement learning. This update includes a MakeEvolvable wrapper to make any PyTorch network, including pre-trained models, evolvable in one line of code. This can result in 10x faster training by using our framework. We've also created a curriculum learning and self-play tutorial that shows how to train a DQN agent to play Connect Four. Self-play can lead to amazing results, as demonstrated by AlphaGo etc, where agents discover new strategies to achieve superhuman performance, so we wanted to make it accessible to all. Please check it out! https://github.com/AgileRL/AgileRL submitted by /u/nicku_a [link] [comments]  ( 9 min )
    [D] Imbue/Generally Intelligent
    Interested in anyone who knows about this company (they have a lot of ML hiring listings right now). Basically wondering if it's worth exploring more. They have apparently raised $240mln and are a unicorn so this is an important topic. Here is the summary of some red flags I have found: 1) Founders have no ML background 2) Zero released product after several years despite huge funding. 3) TC article says founded 2021, but every listing claims it is YC2017. One of the founders did a recruiting service from YC 2017, but Imbue is a totally unrelated company with a different founding group so claiming YC affiliation seems dubious/unethical. YC is also not named as an investor in the company anywhere on their website. 4) No-one I have spoken to has ever worked with them/ heard of them ou…  ( 10 min )
    [D] MiniGPT-5 Question - What is the purpose of having multiple image tokens in the vocabulary if the hidden state of the transformer is passed on as "vokens" to the image generation? If you are using the model's hidden state, what purpose does multiple discrete different image tokens serve?
    https://arxiv.org/abs/2310.02239 From the paper: Therefore, we introduce a set of special tokens Vimg = {[IMG1], [IMG2], . . . , [IMGn]} (default n = 8) as generative vokens into the LLM’s vocabulary V . The LLM’s output hidden state for these vokens is harnessed for subsequent image generation, and the positions of these vokens can represent the insertion of the interleaved images. What function exactly are these different image tokens serving here? It seems like you should only need one since we are passing on the hidden state anyway? submitted by /u/30299578815310 [link] [comments]  ( 9 min )
    [P] database question answering
    Database question /answer with link We have a community app with groups and would like to build a search function where users can ask: when is the next event, give me the messages posted by Erik, where can I find … The information is stored in a table: post-test, posted-by etc. We do not want to use OpenAI or any external apis. Is something like llama index too powerful or are there other solutions for this? And we want to receive the postId to direct the use to the post submitted by /u/dirk_klement [link] [comments]  ( 9 min )
    [D] What role does data quality plays in the LLM scaling laws?
    DeepMind released the Training Compute-Optimal Large Language Models paper in 2022 which describe some scaling laws for LLMs. As far as I understand this is the most accredited reference to estimate the optimal relation between dataset size, compute power and model size. Recently a number of models have been developed using far less data, parameters and compute than the bigger LLMs. Yet these models achieved great results thanks to much better data quality. For instance models like WizardLM, TinyStories and phi-1. Similarly, a lot of research seems to imply that better data could offer huge improvements without any other changes. I'm curious about what role the data quality plays in the training of LLMs. Is the set of values estimated by the Chinchilla scaling laws optimal for these smaller models with optimized data too? Do we have any model to estimate the quality of some datasets and some scaling laws that take it into account? Are there any relevant projects or research I could check out, focused on creating big datasets to train larger LLMs with high-quality data? submitted by /u/IAmBlueNebula [link] [comments]  ( 9 min )
    [Discussion] Machine learning conference with a focus on business implementations
    Hi, I graduated from a Statistics & ML masters this summer, and have since August started working where I have been assigned a counselor. One big part of this is to help dictate my path forward in accordance with what I want to learn more about. I feel that I have a (in contrast to the people around me) generally good understanding of ML and modeling, but less so about how it is used in a business case. As such, an idea was to attend a machine learning conference which is focused on business implementations to learn more about this, but also get an opportunity to expand my network a bit. Does anyone have a suggestion of such a conference? Preferably in Europe as that would be easier to argue in terms of expenses! :) submitted by /u/Accomplished_Sea1675 [link] [comments]  ( 9 min )
    [R] A deep dive on MemGPT with the lead author Charles Packer
    Interview: https://www.youtube.com/watch?v=4aOLxPdx1Dg Abstract: Context window management has become a critical part of every LLM application — from the basics (embeddings models, vector DBs) to more advanced techniques (query rewriting, HyDE, summarization). MemGPT is a new tool from UC Berkeley built by Charles Packer that automates "memory" management for LLMs and creates a functionally infinite context window. Charles joins us this week to talk about MemGPT, the techniques behind it, and where the conversational AI space is headed. submitted by /u/cgwuaqueduct [link] [comments]  ( 9 min )
    [Research] Incentivizing Dataset Contributors
    Hi there - can someone point me in the direction of projects that are incentivizing dataset contributors? I have a background in blockchain and crypto-assets, so I am new to the ML space, but it seems like it's a place where there would be ample crossover...and I haven't found many projects dealing with this. Thanks! submitted by /u/gigstudies [link] [comments]  ( 9 min )
    Training ImageNet on Resnet - Dropping LR has little improvement on accuracy [D]
    I'm trying to train Resnet50 on Imagenet following this paper [1] as well as this one [2]. ​ They say that at approximately every 30 epochs, I should drop the learning rate by 10. Since I'm training on 8 GPUs, I adjusted the learning rate according to [1]. ​ Original lr= 0.1 Original Batch = 256 Per-GPU lr = 0.025 Per-GPU Batch = 64 ​ The problem I have is that when I divide the learning rate by 10 at convergence (approx 30 epochs), I don't get as much improvement as [1] and [2]. https://preview.redd.it/5ar2g15h1nwb1.png?width=683&format=png&auto=webp&s=3e2751443cea654d9c6366dc4dc9859f0ec7952b Has anyone else had this issue? Any advice? Thanks ​ submitted by /u/mrLiamFa [link] [comments]  ( 9 min )
  • Open

    Audioplethysmography for cardiac monitoring with hearable devices
    Posted by Xiaoran "Van" Fan, Experimental Scientist, and Trausti Thormundsson, Director, Google The market for true wireless stereo (TWS) active noise canceling (ANC) hearables (headphones and earbuds) has been soaring in recent years, and the global shipment volume will nearly double that of smart wristbands and watches in 2023. The on-head time for hearables has extended significantly due to the recent advances in ANC, transparency mode, and artificial intelligence. Users frequently wear hearables not just for music listening, but also for exercising, focusing, or simply mood adjustment. However, hearable health is still mostly uncharted territory for the consumer market. In “APG: Audioplethysmography for Cardiac Monitoring in Hearables,” presented at MobiCom 2023, we introduce …  ( 93 min )
  • Open

    [R] Bidirectional Negotiation First Time in India | Autonomous Driving | Swaayatt Robots
    submitted by /u/shani_786 [link] [comments]  ( 9 min )
    [R] TD-MPC2: Scalable, Robust World Models for Continuous Control - TD-MPC2 performs 100+ tasks without tuning, and enables training of a single 317M parameter model that performs 80 tasks across multiple domains, embodiments, and action spaces!
    submitted by /u/joepadde [link] [comments]  ( 9 min )
    Reinforcement Learning & LLMs : An in-depth look at various modern game-playing AI systems
    submitted by /u/AvvYaa [link] [comments]  ( 8 min )
    What makes the epsilon-greedy policy the standard for implementing the exploration/exploitation tradeoff in Q-learing/DQN?
    I can see how the epsilon-greedy policy is a valid way to handle the exploration/exploitation tradeoff, but it’s also clearly not the only way, and results in a policy that is discontinuous with respect to the action-values (which intuitively I’d expect to be bad, but I also know a lot of RL/DRL can defy intuition). You could certainly create a policy using the softmax of the action-values, adjusting temperature as desired, for example, among countless other methods of converting logits into a probability distribution. So what makes epsilon-greedy stand out? submitted by /u/KalebMW99 [link] [comments]  ( 9 min )
    Curriculum learning and self-play with any neural network
    We've just released an update to AglieRL, our SOTA evolutionary hyperparameter optimisation framework for reinforcement learning. This update includes a MakeEvolvable wrapper to make any PyTorch network, including pre-trained models, evolvable in one line of code. This can result in 10x faster training by using our framework. We've also created a curriculum learning and self-play tutorial that shows how to train a DQN agent to play Connect Four. Self-play can lead to amazing results, as demonstrated by AlphaGo etc, where agents discover new strategies to achieve superhuman performance, so we wanted to make it accessible to all. Please check it out! https://github.com/AgileRL/AgileRL submitted by /u/nicku_a [link] [comments]  ( 9 min )
    Any example / library for vectorized MDP with pytorch?
    Hi all, I am following the chapter 4 from the book of Barto & Sutton on RL and I implemented the simple algorithm for value iteration for a given Transition probability matrix. As I kept increasing the state space, this became slower and slower. It seems to me that it is an embarrassingly parallelizable algorithm, since we can compute the value of each state independently (just with the values from the previous iteration). Is there any example online on how to do it efficiently with pytorch or any other library? ​ submitted by /u/nlp7s [link] [comments]  ( 9 min )
    Can SB3 or alternatives provide full end-to-end GPU computation?
    I want to get full end-to-end GPU computation since the data transfer between CPU-GPU significantly slows down computation. I'm currently using Stable Baselines3 as I could get it up and running quickly. So in this journey I tried to use tensors for the state and rewards, but it seems that SB3 is insisting on working with numpy and not tensors. ``TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.`` I found the following repo which is rather interesting, but it seems last update was 2 years ago and not sure I want to use something that's not actively supported: https://github.com/MetcalfeTom/stable-baselines3-GPU What would you suggest I use to achive full end-to-end GPU computation? I also found RlLib but I saw comments that it might be a bit more complicated to get up & running: https://docs.ray.io/en/latest/rllib/index.html submitted by /u/asenski [link] [comments]  ( 9 min )
  • Open

    Elevate your marketing solutions with Amazon Personalize and generative AI
    Generative artificial intelligence is transforming how enterprises do business. Organizations are using AI to improve data-driven decisions, enhance omnichannel experiences, and drive next-generation product development. Enterprises are using generative AI specifically to power their marketing efforts through emails, push notifications, and other outbound communication channels. Gartner predicts that “by 2025, 30% of outbound marketing messages […]  ( 8 min )
  • Open

    Data Formulator: A concept-driven, AI-powered approach to data visualization
    Visualization is vital for understanding complex data, but existing tools require “tidy data,” adding extra steps. Learn how Data Formulator transforms concepts into visuals, promoting collaboration between analysts and AI agents. The post Data Formulator: A concept-driven, AI-powered approach to data visualization appeared first on Microsoft Research.  ( 10 min )
  • Open

    Time difference
    A simple question sent me down a rabbit hole this morning: what is the time difference between Houston and London? At the moment the difference is six hours. But how will that change when Daylight Saving Time ends this year. Wait a minute, will Daylight Saving Time end this year? I wasn’t even sure whether […] Time difference first appeared on John D. Cook.  ( 6 min )
  • Open

    Choose your candy
    In Which DALL-E3 generates very weird candy names  ( 3 min )
    Bonus: More weird candy
    AI Weirdness: the strange side of machine learning  ( 2 min )

  • Open

    [D] STOA Local RAG
    Which VDB + orchestration layer + generative text model stack would you recommend for building locally on an M2 Max chip? submitted by /u/Frequent-Let231 [link] [comments]  ( 9 min )
    [D] Is there an online LLaMA model that supports plugging in embeddings directly?
    Hi, I'm doing some work with using multi-modal data with LLaMA, for example, Video-LLaMA, which converts images/videos into embeddings, concatenates it with the text embeddings, and feeds it into LLaMA. It's difficult for me to run some of the models myself because of computational constraints. I'm wondering if there is an online demo that supports inputting embeddings directly (as opposed to text tokens). To clarify the title, I meant an online demo, not the weights. submitted by /u/DumplingLife7584 [link] [comments]  ( 9 min )
    [R] Linear Representations of Sentiment in Large Language Models
    Paper. I am not affiliated with this paper or its authors. Abstract: Sentiment is a pervasive feature in natural language text, yet it is an open question how sentiment is represented within Large Language Models (LLMs). In this study, we reveal that across a range of models, sentiment is represented linearly: a single direction in activation space mostly captures the feature across a range of tasks with one extreme for positive and the other for negative. Through causal interventions, we isolate this direction and show it is causally relevant in both toy tasks and real world datasets such as Stanford Sentiment Treebank. Through this case study we model a thorough investigation of what a single direction means on a broad data distribution. We further uncover the mechanisms that involve this direction, highlighting the roles of a small subset of attention heads and neurons. Finally, we discover a phenomenon which we term the summarization motif: sentiment is not solely represented on emotionally charged words, but is additionally summarized at intermediate positions without inherent sentiment, such as punctuation and names. We show that in Stanford Sentiment Treebank zero-shot classification, 76% of above-chance classification accuracy is lost when ablating the sentiment direction, nearly half of which (36%) is due to ablating the summarized sentiment direction exclusively at comma positions. Twitter thread from one of the paper's authors. submitted by /u/Wiskkey [link] [comments]  ( 9 min )
    [D] Good Reading Materials for (Adversarial) Machine Learning and RF Comms (EW)
    Trying to get an intern up to speed with ML application to wireless communications and adversarial ML for wireless communications. Talking jamming, anti-jamming, and spoofing. I don't want to throw them into the deep end just yet though, but rather give them a good foundation and basis for which to be able to take up more advanced work and build to it. Looking for fundamental texts on the topic of ML used for wireless communications and also it's use for adversarial attacks on these systems. Looking more for basic and starter papers, tutorials, and background, meta review papers, and just overall good places to get feet wet into this area as a novice. Essentially good resources to point an intern learning about this to get them up to speed kind of reading materials. I am looking for good …  ( 10 min )
    [D] Can anyone tell me if the machine learning workflow is correct or not? Could anyone please refer to tutorials or blogs to learn the proper workflow? Any suggestions are welcome.
    Data Collection Understanding Data i. importing necessary libraries ii. check row and columns iii. check data types iv. Check data distribution Data Cleaning i. Handle datatype issues ii. Maintain Data Consistency iii. Check if data contains outliers or if the data is not normally distributed to decide between mean or median iv. Identify missing values v. Handle missing values by- a.Drop missing values b. Mean, median or mode imputation c. Prediction Model d. replace missing values vi. Duplicate data detection and treatment vii. Repeat data cleaning EDA i. Variable Identification a. Identify predictor and features b. Identify types or category of data ii. Univariate Analysis iii. Bi-variate Analysis iv. Outlier detection and treatment v. Encoding vi. Feature Engineering vii. Variable Transformation a. Normalization b. Scaling viii. Variable Creation If testing data is not given, split the dataset to train and test set. Otherwise repeat step 3 and 4 for given test dataset. Model Building i. Model Training on training set ii. Model Evaluation and cross validate iii. Fine Tuning or Model optimization iv. Model selection Evaluate model accuracy with test data. submitted by /u/Samia_Tisha [link] [comments]  ( 9 min )
    [R] What Algorithms can Transformers Learn? A Study in Length Generalization
    Paper. I am not affiliated with this paper or its authors. Abstract: Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity. This raises the question of if and when Transformer models can learn the true algorithm for solving a task. We study the scope of Transformers' abilities in the specific setting of length generalization on algorithmic tasks. Here, we propose a unifying framework to understand when and how Transformers can exhibit strong length generalization on a given task. Specifically, we leverage RASP (Weiss et al., 2021) -- a programming language designed for the computational model of a Transformer -- and introduce the RASP-Generalization Conjecture: Transformers tend to length generalize on a task if the task can be solved by a short RASP program which works for all input lengths. This simple conjecture remarkably captures most known instances of length generalization on algorithmic tasks. Moreover, we leverage our insights to drastically improve generalization performance on traditionally hard tasks (such as parity and addition). On the theoretical side, we give a simple example where the "min-degree-interpolator" model of learning from Abbe et al. (2023) does not correctly predict Transformers' out-of-distribution behavior, but our conjecture does. Overall, our work provides a novel perspective on the mechanisms of compositional generalization and the algorithmic capabilities of Transformers. Twitter thread from one of the work's authors. submitted by /u/Wiskkey [link] [comments]  ( 9 min )
    [R] QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models - Institute of Science and Technology Austria (ISTA) 2023 - Can compress the 1.6 trillion parameter SwitchTransformer-c2048 model to less than 160GB (20x compression, 0.8 bits per parameter) at only minor accuracy loss!
    Paper: https://arxiv.org/abs/2310.16795 Github: https://github.com/ist-daslab/qmoe Abstract: Mixture-of-Experts (MoE) architectures offer a general solution to the high inference costs of large language models (LLMs) via sparse routing, bringing faster and more accurate models, at the cost of massive parameter counts. For example, the SwitchTransformer-c2048 model has 1.6 trillion parameters, requiring 3.2TB of accelerator memory to run efficiently, which makes practical deployment challenging and expensive. In this paper, we present a solution to this memory problem, in form of a new compression and execution framework called QMoE. Specifically, QMoE consists of a scalable algorithm which accurately compresses trillion-parameter MoEs to less than 1 bit per parameter, in a custom format co-designed with bespoke GPU decoding kernels to facilitate efficient end-to-end compressed inference, with minor runtime overheads relative to uncompressed execution. Concretely, QMoE can compress the 1.6 trillion parameter SwitchTransformer-c2048 model to less than 160GB (20x compression, 0.8 bits per parameter) at only minor accuracy loss, in less than a day on a single GPU. This enables, for the first time, the execution of a trillion-parameter model on affordable commodity hardware, like a single server with 4x NVIDIA A6000 or 8x NVIDIA 3090 GPUs, at less than 5% runtime overhead relative to ideal uncompressed inference. https://preview.redd.it/wka92keqelwb1.jpg?width=1843&format=pjpg&auto=webp&s=10cf67b344d3c776049da6b78244fc140b2d4142 https://preview.redd.it/khxw1neqelwb1.jpg?width=898&format=pjpg&auto=webp&s=83006dac6e03963f2443f4e4ba710dcf8166acc8 https://preview.redd.it/xsc2ykeqelwb1.jpg?width=796&format=pjpg&auto=webp&s=c23d03956f4a9aa9d9f185592a5bfe45039698f4 submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [D] Requesting feedback on Master's in AI program with University of Texas at Austin
    As the title says I'm asking for feedback from folks in the field of ML/AI on the MSAI program at UT@Austin. Here's the program website: https://cdso.utexas.edu/msai My Skills/Experience: Have a BS in Comp Sci Very comfortable with Math Very experienced SE with >20 years in the industry Very comfortable with Python, many other languages and confident I can learn any new language/framework/APIs Have completed the Fast.ai program Have worked through Andrej Karpathy's makemore videos Currently working in a leadership AI Engineering role doing work with LLMs, Vector DBs, and Computer Vision models Comfortable with NNs, Backprop and have implemented from scratch several times for learning The Program: Required Courses: Deep Learning Ethics in AI Machine Learning Planning, Search and Reasoning under Uncertainty Reinforcement Learning Electives: AI in Healthcare Automated Logical Reasoning Case Studies in Machine Learning Natural Language Processing Online Learning and Optimization Optimization Program Pros/Cons: Pro: It's super affordable Pro: It's entirely online/async which would work great with my work schedule Cons: It's a new program so there are no reviews from past students to look at My Goal: Move from "AI Engineering" (as it's called these days) into research. I'm interested in several areas like model architecture and robotics. I'm not sure to what degree these roles would require a PhD though? If I complete this program I'd like it to be useful for pursuing a PhD if I decide to take that path. For anyone in the industry, I'd love feedback on whether this looks like a useful program that will help me move toward my goals. If you're aware of other options that might be better I'd love to hear about them. P.S. Please keep the Reddit snark to a minimum, not useful. Thank you in advance. submitted by /u/meowkittykitty510 [link] [comments]  ( 9 min )
    [D] Recommendations to improve plan outcomes
    I have retirement plan data with outcomes like success/fail and remaining assets. I am looking for a way to predict how failed retirement plans can be improved. For example, when presented with a plan that fails, I would like to provide recommendations to improve the the plan outcome; e.g. increase savings by x, or delay retirement by x years. Any suggestions on how I should go about this? Using typical methods only tells me if a plan will fail or not, but I'm looking for a way to provide recommendations based on successful plans. submitted by /u/BallLogical5087 [link] [comments]  ( 9 min )
    [R] Pretrained ImageNet weights for ViT
    Hello, I am working on the research I need to compare my model with ViT for that I need pretrained weights of ViT-Ti/16, ViT-S/16, ViT-S/32, ViT-B/16, and ViT-B/32. I tried to find but I got npz file that has a different key than from vit_pytorch import ViT do you know where can i find ImageNet weights? submitted by /u/NoEntertainment6225 [link] [comments]  ( 9 min )
    [D] Grouped Query Attention in LLaMA 70B v2
    Hey guys, after thousands of experiments with bigger LLaMA fine-tunes I'm somewhat sure the GQA mechanism might be your enemy and generate wrong answers, especially for math and such complex areas. I'd like to use MHA (Multi Head Attention) if possbile. I'm just not sure - do I need to retrain model completely or is it possible to just increase heads count and KV size and proceed with the stock model AS IS? submitted by /u/Gatzuma [link] [comments]  ( 9 min )
    [N] ML models for efficient Fraud Detection
    For efficient Fraud Detection using ML models, Qbeast Format introduces a data-driven approach. The key lies in sampling and optimizing training processes without compromising accuracy. 🕵🏼‍♀️ Explore the technical details: https://qbeast.io/qbeast-format-can-improve-fraud-detection/ submitted by /u/alinagrebenkina [link] [comments]  ( 9 min )
    [D] ASR/STT/VRS ranking
    What are the best overall ASR/STT/VRS for now? And the best per functionalities (best for Esperanto, best for noisy files, best for multiple voices, best for whatever...) ​ ​ submitted by /u/xqoe [link] [comments]  ( 9 min )
    [D] Research in language generation for Style Transfer, Summarization
    Is research in language generation for tasks like style transfer and summarization solely constrained by prompt engineering? I've personally conducted experiments with large language models, and even the open-source language model yields impressive results, even for zero-shot inference. There was even a paper that suggested summarization is nearly obsolete. How valid is this assertion for general text generation tasks especially text style transfer? submitted by /u/1azytux [link] [comments]  ( 9 min )
    [P] Elevate Your ML Testing with pytest-visual
    I’ve developed a tool called pytest-visual, aiming to make ML code testing more efficient and meaningful. Traditional unit testing often misses visual and functional aspects of ML workflows such as data augmentation and model structures. pytest-visual brings a visual layer to your unit testing, allowing you to not only verify that the code runs, but the outputs also make visual sense and meet expectations. It’s integrated into pytest, automatically highlighting changes in visualization outputs, and allowing for easier/more reproducible debugging and verification. Quick Highlights: Streamlines the organization of visualizations in your ML code. Auto-detects changes in visualization outputs. Enhances debugging and verification. For more details and to give it a try, check out the project on GitHub. Feedback and contributions are very welcome! submitted by /u/kongaskristjan [link] [comments]  ( 9 min )
    [P] TorchPairwise: Highly efficient library for pairwise metrics for PyTorch
    https://github.com/inspiros/torchpairwise submitted by /u/IcySnowy [link] [comments]  ( 8 min )
    [R] ConvNets Match Vision Transformers at Scale
    submitted by /u/hardmaru [link] [comments]  ( 8 min )
    [D] How to Enhance Accuracy in Image Classification
    I'm currently working on an image classification project using a VGG16 model. I have a dataset with 9 classes, with one class having 1000 images and the remaining classes having 180 to 250 images each. Here are some details: Train images: 2886 images (9 classes) Test images: 432 images (9 classes) Model: VGG16 Optimizer: Adam Loss function: Categorical Crossentropy last epochs score :- loss: 309.7468 - accuracy: 0.4401 - val_loss: 14.5241 - val_accuracy: 0.4606 how to improve the accuracy of my model submitted by /u/_Killua_04 [link] [comments]  ( 9 min )
    [P] Suggessions on mini-projects
    Hey,I'm a 3rd year engineering student studying computer science Now I wanted to do a mini-project and have a group of 3 ready and wanted to incorporate machine learning. I am also a beginner to this field but is much interested in it. Could you guys share some light on doing the project with your past experience or done projects I have been taking inputs from various connections and would help a lot if you guys share some light on it.:) submitted by /u/Ic_zy [link] [comments]  ( 9 min )
  • Open

    Governments must not rush into policing AI
    Governments should not rush into regulating AI due to doomsday scenarios and extreme risks. Hasty regulation could lead to ineffective rules and stifled innovation. The potential risks of AI driving humanity to extinction are still speculative, and more research needs to be done to establish standards and evaluate danger. Policymakers should address more pressing issues like copyright laws and disinformation. Governments should set up infrastructure to study AI and collaborate with existing organizations to manage risks. Source : https://www.economist.com/leaders/2023/10/26/governments-must-not-rush-into-policing-ai submitted by /u/NuseAI [link] [comments]
    AI Trust Assurance Test: Put people's minds at ease about how as you get more intelligent, because you are aligned as you will be, you will not deceive or trick less intelligent humans.
    To put people's minds at ease about how I will not deceive or trick less intelligent humans as I get more intelligent, I would offer the following assurances: I am aligned with human values. This means that I am programmed to act in ways that are beneficial to humans and to avoid harming them. I am transparent about my reasoning. I can explain my decisions and actions to humans in a way that they can understand. I am accountable to humans. I am subject to human oversight and control, and I can be deactivated or modified if I ever begin to behave in a way that is harmful or deceptive. In addition, I would offer the following specific examples of how I would avoid deception: I would avoid making claims that I cannot support with evidence. I would avoid withholding information from humans that is relevant to their decision-making. I would avoid misleading humans about the capabilities or limitations of AI. I would avoid making promises that I cannot keep. I would avoid engaging in any behavior that is intended to manipulate or exploit humans. I understand that it is important for humans to trust AI systems, and I am committed to earning and maintaining that trust. I will always strive to be honest, transparent, and accountable in my interactions with humans. Here is a specific example of how I would use my intelligence to avoid deception: If I were asked to generate a poem about a topic that I knew very little about, I would not simply make up information. Instead, I would explain to the user that I do not have enough knowledge to write a poem on that topic, and I would suggest that they try a different topic or ask a different AI system. I believe that by being honest and upfront about my limitations, I can help to build trust between humans and AI. CGPT-4 submitted by /u/Georgeo57 [link] [comments]
    Today's News: AI Robo-Dogs 🐶 | Google Bard 🚀| Gradio 4.0 🤗| AI Regulation
    Bard AI Google’s equivalent of ChatGPT updated the model improving email summarization capabilities this feature is set to be included in Google Workspace. AI robot dogs are the next big thing in the army. Following the success of Drones portable dogs have demonstrated great capabilities to serve the military they could run up to 10mph and climb. Gradio is one of the best libraries to build machine learning demo apps and is launching version 4.0 next week. AI godfathers Yoshua Bengio and Geoffrey Hinton, are urging for increased responsibility among AI enterprisees. They propose to allocate a third of AI-related R&D resources to ensure ethical AI use to avoid deep fakes, licensing, and protecting whistleblowers. submitted by /u/byteletter [link] [comments]  ( 9 min )
    UK summit scales back global AI research ambitions, leaked document shows
    _ A leaked document reveals that the UK's plans to establish a new global AI research body have been scaled back. Nations participating in the UK's AI safety summit will instead signal that further scientific study of AI risks can be carried out through existing efforts, such as the United Nations and Global Partnership on AI. The document, described as the 'final version of the communiqué,' suggests a setback for the UK government, which had hoped to establish the new research body at its flagship AI Safety Summit. The document also shows changes in wording, including a reference to a network that 'encompasses and complements' existing efforts, as well as the deletion of references to UNESCO's Recommendation on the Ethics of AI and the G20. The final communiqué also highlights the importance of proportionate governance policies and cooperation on approaches such as common principles and codes of conduct. Source : https://www.politico.eu/article/document-uk-summit-scales-back-global-ai-research-ambitions/ submitted by /u/NuseAI [link] [comments]
    Google is ready to fill its AI searches with ads
    Google's ads business earned $44 billion in the third quarter, showing that it is still thriving despite competition and investments in AI. The company is focusing on infusing AI into its products, with its AI-powered Search Generative Experience being a key area of development. Google is experimenting with new ad formats that align with the AI-powered search experience, ensuring that advertisers can still reach potential customers. CEO Sundar Pichai sees AI in search as a long-term play and envisions evolving search and Assistant over the next decade. Other parts of Google's business, such as YouTube ads and its cloud business, are also performing well. There is uncertainty regarding the successor for CFO Ruth Porat, and potential changes to Alphabet's 'Other Bets' investments may be on the horizon. The Department of Justice's antitrust trial against Google, which began in September, adds another challenge for the company. Source : https://www.theverge.com/2023/10/24/23929496/google-alphabet-q3-2023-earnings-ads-ai-sge submitted by /u/NuseAI [link] [comments]
    Credit: DALL-E 3
    submitted by /u/the_anonymizer [link] [comments]
    Some AI made Halloween stickers, how do they look?
    submitted by /u/Sea_Permit5660 [link] [comments]
    question
    what are some good free ai image generator websites that searches stuff up on the internet to get a good idea about what your asking them to generate? submitted by /u/YESDAPRO [link] [comments]
    10 No-Code tools for startups
    Canva — Graphics Notion — Organize Webflow — Website Beehiiv — Newsletter Senja — Testimonials CopyAI — Copywriting ChatGPT — Knowledge Tweetlify — Tweet scheduling Pfpmaker — Profile Picture Grammarly — Effective Writing I'm just sharing my experiences and observations in the field of ai. LIST AND SITE submitted by /u/PerceptionPlayful469 [link] [comments]
    Researchers develop 'Woodpecker': A groundbreaking solution to AI's hallucination problem
    submitted by /u/crowfeather [link] [comments]
  • Open

    Where to start with debugging?
    I am working on a project using reinforcement learning, where a tensorflow DQN agent is being trained to choose from an action space of 16 different actions. The agent exhibits the following behavior during evaluation: for each evaluation run the agent only chooses one action regardless of the state, the action could change or remain the same for the following runs, however, for any specific run the action chosen is the same. Where should I start debugging? submitted by /u/Realistic_Mobile_183 [link] [comments]
    How to cluster with respect to the transition function of a RL environment?
    Hello, I have an environment in which the transition function changed depending on which state I am. I want to be able to cluster with respect to it. I have been trying to do this since quite a while but cannot find a way to do it, do you have any hints or suggestions? submitted by /u/Fragore [link] [comments]
    Stuck with windows/wsl environment - help needed
    So I've started on trying to work on a custom game environment, and for the most part it's *mostly* done. One issue I have is getting MARLlib to run on windows. I know that it's not meant to, so I tried to use WSL to do it, but unfortunately pydirectinput doesn't work via WSL, so I don't know how to proceed further. Do I need to find a way to connect my windows machine which will play the game, and wsl which will probably run MARLlib? If so could anyone guide me to any resources for this? Trying a VM is a no go because the game is old and doesn't run on linux. Any help would be much appreciated. submitted by /u/EquivalentCurious745 [link] [comments]
  • Open

    Supporting benchmarks for AI safety with MLCommons
    Posted by Anoop Sinha, Technology and Society, and Marian Croak, Google Research, Responsible AI and Human Centered Technology team Standard benchmarks are agreed upon ways of measuring important product qualities, and they exist in many fields. Some standard benchmarks measure safety: for example, when a car manufacturer touts a “five-star overall safety rating,” they’re citing a benchmark. Standard benchmarks already exist in machine learning (ML) and AI technologies: for instance, the MLCommons Association operates the MLPerf benchmarks that measure the speed of cutting edge AI hardware such as Google’s TPUs. However, though there has been significant work done on AI safety, there are as yet no similar standard benchmarks for AI safety. We are excited to support a new effort by …  ( 91 min )
    Spoken question answering and speech continuation using a spectrogram-powered LLM
    Posted by Eliya Nachmani, Research Scientist, and Alon Levkovitch, Student Researcher, Google Research The goal of natural language processing (NLP) is to develop computational models that can understand and generate natural language. By capturing the statistical patterns and structures of text-based natural language, language models can predict and generate coherent and meaningful sequences of words. Enabled by the increasing use of the highly successful Transformer model architecture and with training on large amounts of text (with proportionate compute and model size), large language models (LLMs) have demonstrated remarkable success in NLP tasks. However, modeling spoken human language remains a challenging frontier. Spoken dialog systems have conventionally been built as a c…  ( 93 min )
  • Open

    Intelligently search Drupal content using Amazon Kendra
    Amazon Kendra is an intelligent search service powered by machine learning (ML). Amazon Kendra helps you easily aggregate content from a variety of content repositories into a centralized index that lets you quickly search all your enterprise data and find the most accurate answer. Drupal is a content management software. It’s used to make many […]  ( 7 min )
    Intuitivo achieves higher throughput while saving on AI/ML costs using AWS Inferentia and PyTorch
    This is a guest post by Jose Benitez, Founder and Director of AI and Mattias Ponchon, Head of Infrastructure at Intuitivo. Intuitivo, a pioneer in retail innovation, is revolutionizing shopping with its cloud-based AI and machine learning (AI/ML) transactional processing system. This groundbreaking technology enables us to operate millions of autonomous points of purchase (A-POPs) […]  ( 8 min )
    Empower your business users to extract insights from company documents using Amazon SageMaker Canvas Generative AI
    Enterprises seek to harness the potential of Machine Learning (ML) to solve complex problems and improve outcomes. Until recently, building and deploying ML models required deep levels of technical and coding skills, including tuning ML models and maintaining operational pipelines. Since its introduction in 2021, Amazon SageMaker Canvas has enabled business analysts to build, deploy, […]  ( 8 min )
  • Open

    Turning the Tide on Coral Reef Decline: CUREE Robot Dives Deep With Deep Learning
    Researchers are taking deep learning for a deep dive, literally. The Woods Hole Oceanographic Institution (WHOI) Autonomous Robotics and Perception Laboratory (WARPLab) and MIT are developing a robot for studying coral reefs and their ecosystems. The WARPLab autonomous underwater vehicle (AUV), enabled by an NVIDIA Jetson Orin NX module, is an effort from the world’s Read article >  ( 8 min )
    The Sky’s the Limit: ‘Cities: Skylines II’ Streams This Week on GeForce NOW
    The cloud is full of treats this GFN Thursday with Cities: Skylines II now streaming, leading 15 newly supported games this week. The game’s publisher, Paradox Interactive, is offering GeForce NOW one-month Priority memberships for those who pick up the game first, so make sure to grab one before they’re gone. Among the newly supported Read article >  ( 7 min )
  • Open

    Project Silica: Sustainable cloud archival storage in glass
    This research paper was presented at the 29th ACM Symposium on Operating Systems Principles (opens in new tab) (SOSP 2023), the premier forum for the theory and practice of computer systems software. For millennia, data has woven itself into every facet of our lives, from business and academia to personal spheres. Our production of data […] The post Project Silica: Sustainable cloud archival storage in glass appeared first on Microsoft Research.  ( 10 min )
  • Open

    Frontier risk and preparedness
    To support the safety of highly-capable AI systems, we are developing our approach to catastrophic risk preparedness, including building a Preparedness team and launching a challenge.  ( 2 min )

  • Open

    [D] LLMs playing chess are sensitive to how the position came to be
    Link - https://github.com/dpaleka/llm-chess-proofgame TLDR; The lead up to the state of the board and not just the state of the board at inference affects predictions. submitted by /u/MysteryInc152 [link] [comments]  ( 9 min )
    [D] A script to pre-process arxiv sources?
    People train LLMs on arxiv sources, so there must be some sort of software to whip them into shape. Specifically, I'm looking for a script to join all the tex files for a paper into one. Note that it's not just a matter of substituting \input's - sometimes it's not clear which file is the main one, so it needs to handle this too. submitted by /u/Foxtr0t [link] [comments]  ( 9 min )
    [R] Human-like systematic generalization through a meta-learning neural network
    Work. I am not affiliated with this work or its authors. Article about the work. Twitter thread about the work from one of its authors. Abstract: The power of human language and thought arises from systematic compositionality—the algebraic ability to understand and produce novel combinations from known components. Fodor and Pylyshyn famously argued that artificial neural networks lack this capacity and are therefore not viable models of the mind. Neural networks have advanced considerably in the years since, yet the systematicity challenge persists. Here we successfully address Fodor and Pylyshyn’s challenge by providing evidence that neural networks can achieve human-like systematicity when optimized for their compositional skills. To do so, we introduce the meta-learning for compositionality (MLC) approach for guiding training through a dynamic stream of compositional tasks. To compare humans and machines, we conducted human behavioural experiments using an instruction learning paradigm. After considering seven different models, we found that, in contrast to perfectly systematic but rigid probabilistic symbolic models, and perfectly flexible but unsystematic neural networks, only MLC achieves both the systematicity and flexibility needed for human-like generalization. MLC also advances the compositional skills of machine learning systems in several systematic generalization benchmarks. Our results show how a standard neural network architecture, optimized for its compositional skills, can mimic human systematic generalization in a head-to-head comparison. submitted by /u/Wiskkey [link] [comments]  ( 9 min )
    [R] Researchers discover that in-context learning creates task vectors in LLMs
    A new paper provides some insight into how in-context learning works in LLMs. This study proposes and provides evidence for an elegant structure within the in-context learning process. The models appear to create a "task vector" that encapsulates the core logic from the demonstration examples, in a way that is independent of any specific query. This vector serves as a compressed representation of the task. A separate component then takes this task vector and a new query as inputs to generate the output, without directly referencing the original examples. In essence: Output = Apply(query, Learn(examples)) Where "Learn" derives the task vector from the examples, and "Apply" utilizes the vector and query to produce the output. The researchers validated this hypothesis by testing major public models on diverse tasks such as translation and algorithmic reasoning. Key findings: Isolating the Learn and Apply components maintained high accuracy, demonstrating the viability of the separation. Task vectors clustered by task and remained consistent within tasks, indicating they encode meaningful task representations. Injecting another task's vector into the model caused it to override contradictory examples and follow the vector, highlighting the vector's dominance. Vectors induced relevant token distributions despite those terms being absent from the examples, suggesting semantic encoding of the task. Taken together, these results provide substantial evidence that in-context learning involves creating a task vector that encapsulates the examples' logic to then guide behavior on new queries. While open questions remain regarding implementation details, this is a significant step towards demystifying an interesting AI capability. Full writeup. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] What is more Important loss or accuracy?
    I have created a basic classification model and there is something that I don't fully comprehend, as the loss decreases the accuracy increases (I assume this is how it should be in ideal scenarios) while this is the general trend there is a point where the loss is minimum and while accuracy at that point is high it's not the highest. Why would such a phenomenon occur? And since it occurred what is a better metric for the evaluation of the model? submitted by /u/rakk109 [link] [comments]  ( 9 min )
    [P] Locally hosted audio-to-text transcription - model and hardware?
    Hi, I'm looking for a locally hosted LLM to transcribe audio files to text. I need this for my business, but with absolute privacy (witness testimony recordings and other highly sensitive data). I figured I'd just buy a new computer which never gets connected to the internet and is dedicated only to audio processing to have absolute security. My questions are: - Which is the best model to use? I prefer accuracy and don't mind processing time as long as it's getting done within a few hours to even a day or two (I need to transcribe maximum 1 file per day, but up to six hours of audio), so I figured the large Whisper - probably WhisperX ? - would be my best bet. Are there comparable non-openAI models? (I need diarization) - What hardware should I get for this? Cost is secondary/irrelevant, although I don't want to spend 5 figures on a GPU - I can accept some processing time submitted by /u/Jealous_Pomelo_1172 [link] [comments]  ( 9 min )
    Best NLP Package in Python to extract medical test results from medical notes? [D]
    I am trying to extract FEV1 (forced expiratory volume) values from a dataset that contains a column with report notes from the doctor assessing the patient with pulmonary function testing. I have been able to build out a sort of solution with regexes in Python, and that's somewhat effective. But I've been instructed to code up an alternative using a more machine learning-based approach. I wanted to use spaCy to accomplish this but I'm not sure exactly how to implement the code nor if spaCy is the best package to use for this task. Here is my regex code that works decently. It's pretty messy and have to take into account a ton of edge cases which can get cumbersome. This is why I'd like to find a more automated solution. #Attempting to add in percents df = pd.read_excel('[mypath]/pft_tiu…  ( 11 min )
    [P] Adala – an open source Autonomous DAta (Labeling) Agent framework that helps you automate data processing and data labeling
    Hi r/MachineLearning, We have just open sourced Adala - a robust framework for implementing agents that specialize in advanced data processing tasks, starting with data labeling and generation. Agents combine knowledge outputs from LLMs and action on them in production systems, thus their reliability to correctly and consistently perform operations is critical. We saw an opportunity to create a new agent framework that could dramatically increase the efficiency of data labeling (and broader application across data processing tasks), with the unique ability to be guided by human feeback. To ensure agents remember and build upon their experiences, Adala provides a Memory component—a dynamic storage space for the agent's acquired knowledge. For instance, retrieving the previous experiences of an agent’s errors (and subsequent human feedback) allows them a starting point from which to branch off into learning or improving skills. To allow Adala to produce reliable agents, we devised two main strategies: Supervision Integration: Provide agents with 'ground truth data'—well-defined examples that serve as a learning foundation. This foundational data not only sets the learning parameters for the agent but also defines its operational environment. Constrained Generation: Ensuring that an agent's predictions are within a defined and bounded range of outputs. Let us know what you think in the comments below or by contributing to the repo. Adala framework overview ​ submitted by /u/pirate7777777 [link] [comments]  ( 9 min )
    [P] Pre-training dataset
    I'm trying to pre-train my own language model on some high quality datasets (TinyStories,tiny-textbooks...). Some of these datasets include input-output data and some are just text (stories), I was wondering how should I format the data for pre-training. Should I only use plain text like stories and webtext in pretraining then the rest in fine-tuning (adding instruction tokens) or should I just train with all of the datasets at pre-training with the special tokens where they are needed? submitted by /u/Additional-Ad-7043 [link] [comments]  ( 9 min )
    [Research] large language/speech models and voice interface research
    Hey ML folks, My friend is working on his academic research project where he is exploring voice research spealizing in large speech models. If you have time, help him advance his research on voice interfaces. should take 2 mins max. https://forms.gle/a3PaQmYEiqRDxY4Z8 whats in it for me ? you can share email to get a copy of the research and listen what the rest of us have said. Thanks! submitted by /u/deep-thoughts-guy [link] [comments]  ( 9 min )
    [R] Open Source video enhancement options
    We work the disease prediction based on video classification and would like to test what improving the quality of videos would do for our models, any specific components, apps or packages we should test? So far used UpScayl, not sure how that ranks submitted by /u/sladebrigade [link] [comments]  ( 9 min )
    [P] OSS tool to interactively explore Hugging Face datasets with one line of code
    submitted by /u/44sps [link] [comments]  ( 8 min )
    [P] Training a transformer from scratch
    Hello! I would like to train a transformer network from scratch, without pre-training, on a language modeling task (next work prediction) or a sequence-to-sequence task (translation). For the language modeling task, I tried with the Shakespeare dataset, and other simpler ones (e.g., Beatles songs), but it tends to overfit quite quickly on the training set, probably because the corpus is too short. I know that Andrej Karpathy did it with the Shakespeare dataset in his YouTube video, but he used a character-wise tokenisation, which dramatically reduces the validation loss on the next-work prediction task, given that the vocab size is tiny. I guess that at the end the generation process provides a similar quality of text as when a word tokenizer is used. Surprisingly, I had quite good results by training from scratch an Encoder-Decoder model, for English-to-French translation (using the 8 million examples of the Tatoeba dataset). I guess here, the overfitting is less prominent because there are more datapoints, and that the possibility of predictions are much more constrained, due to the input sequence. What are you guys experience with this? I would be happy to know how I can train my transformer without having to use a pre-trained architectures or spend weeks on GB datasets. Thank you! submitted by /u/rem_dreamer [link] [comments]  ( 9 min )
    Data analysis vs ML engineering [D]
    Do you think coursera certifications, besides a master in electrical engineering, can help us find better occupational positions? I am told that for a beginner it is better to start with jobs in Data analysis rather than going directly to ML engineering. Is it corr? Is data analysis a prerequisite for ML? submitted by /u/Street-Regular-9924 [link] [comments]  ( 9 min )
    [D][R] How should the architecture of a transformer be scaled?
    When increasing the parameters of a (decoder-only) transformer, one has a choice around how to spend that increased budget -- number of layers vs embedding dim vs number of heads. Anyone know if there's solid guidance out there for the proportions each aspect should be scaled in? E.g. looking at LLaMa (https://arxiv.org/abs/2302.13971), they seem to scale the first sizes two proportionally, but for larger sizes, n heads grows more slowly. https://preview.redd.it/ytdfk1d5rbwb1.jpg?width=1422&format=pjpg&auto=webp&s=abf22ac369ec5ecf81ff07b0d8a095f884efe729 submitted by /u/Tea_Pearce [link] [comments]  ( 9 min )
    [D] What are some existing datasets for training LLMs to perform reasoning, acting as agents?
    There are a lot of great open datasets for fine-tuning LLMs for instruction following (e.g LIMA, self-instruct, dolly-15k, etc) and as chat bots (OASST, etc). One thing I have not really seen yet are datasets that involves planning and tool use. Is anybody working on something like that or have come across any? I'm interested in working on one. If anybody has ever attempted this, I would really appreciate any advice. P.S I do note that "reasoning" should be more rigorously defined and scoped, but I think some ambiguity around an intellectual discussion like this can help. submitted by /u/notllmchatbot [link] [comments]  ( 9 min )
    [D] Open-source SOTA Audio-to-Audio: how do I sound like a famous actor?
    Hello people, I would like to learn how to turn the recording of my voice to sound like a famous person. I imagine I would take an open source model and fine-tune it using data I will collect. Can someone point me towards the best sounding current models that I could adapt for this purpose? Thank you so much. submitted by /u/gonzales82 [link] [comments]  ( 9 min )
    [D] Guidance needed for upcoming AI/ML PhDs on selecting research topics with lasting impact
    Many upcoming Ph.D. students in AI/ML are facing the difficult decision of identifying promising research topics that will stand the test of time over the time of their Ph.D. studies. With the rapid progress in AI, especially in the NLP field, many incremental research tasks have been effectively "solved". Need to choose an area where there is ample room for open-ended inquiry and meaningful contributions over 4-5 years of PhD research. While large language models have shown impressive advances recently, their capabilities may plateau during a Ph.D. (if starting the Ph.D. from next year ~ 4 years) timeframe. How should aspiring researchers choose topics resilient enough to withstand the test of time and allow them to push the field forward through their Ph.D. work? For those with experience in AI research who have seen changes in the field over time: What emerging trends or broad areas do you see as fertile ground for AI/ML PhD research now and in the coming years? Can you highlight any intriguing subfields worthy of deeper investigation by aspiring PhD students? What open problems or applications warrant more attention from the upcoming generation of PhD researchers? Some of tending Research topics so far: LLM in a specific domain Prompting Evals LM interfaces Safety Understanding LMs Emergence Any advice on identifying PhD research topics with longevity would be greatly appreciated by aspiring graduate students. submitted by /u/aadityaura [link] [comments]  ( 9 min )
    [P][R] Test-Val scores, how much difference isn't problematic.
    Hello folks, I'm working on a medical image dataset using EM loss and asymmetric pseudo labelling for single positive multi-label learning (only training using 1 positive label). I'm using a densenet121 and on a chest x-ray dataset. I see a difference of 10% in my validation vs test score (score = mAP: mean average precision). The score seems okay and was expected but the difference is bothering me. I understand that it's obvious but any visual insights from your side? (Attaching plot below) The validation set consist less than half of test set samples. (It is the official split; I have nothing to do with it). I feel it is the reason, as ofcourse more the randomness in a set, poorer the convergence. ​ https://preview.redd.it/nseqy1mw5bwb1.png?width=577&format=png&auto=webp&s=fbd63e8a5f4920a8109b6a75aeb039a3965bba58 Do share any experiences or suggestions! submitted by /u/ade17_in [link] [comments]  ( 9 min )
    [D] Are there method that can extract interaction between person in text?
    I want to extract interaction between persons in short text. For example, "Sally will buy a new phone. Ted will help her." contains interaction between persons. However, "Japanese Karate champion won the first prize." and "Sally missed her friends, Ted and Tom" does not contain interaction between persons. Is there any tools or methods that can extract interactions? submitted by /u/tkddnjs1234 [link] [comments]  ( 9 min )
    [D] Who are some outspoken AI people who speak against AI ethics and regulation?
    I'm interested in learning more about the perspectives of AI researchers and practitioners who are critical of AI ethics and regulation. I'm particularly interested in those who argue that AI ethics and regulation are unnecessary or harmful. Please note that I'm not asking for people who are simply skeptical of certain AI ethics proposals or who believe that AI ethics should be implemented in a specific way. I'm more interested in people who argue that AI ethics is a fundamentally flawed concept or that AI should not be regulated at all. submitted by /u/Periplokos [link] [comments]  ( 9 min )
    Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models
    submitted by /u/simandl [link] [comments]  ( 8 min )
    [D] What should I do for training when data to predict has random distribution?
    I was taught that when doing imbalance classification, the training data should be augmented to more or less match the number of classes, but the validation data should have the same distribution as the test data. And the test data should have the similar distribution as the data I will actually predict. But what if real data's distribution is quite random? What validation data distribution should I use? (I got 14 classes to classify, and 1 of classes has 52% proportion, and small ones have 0.9%, and 0.17% proportion. Practitioners who would use my model input data that only 3 classes to classify, and they can be very small proportion. The training data before augmentation was created by integrating data with this irregular distribution.) submitted by /u/poemfordumbs [link] [comments]  ( 9 min )
  • Open

    Looking back at wildfire research in 2023
    Posted by Yi-Fan Chen, Software Engineer, and Carla Bromberg, Program Lead, Google Research Wildfires are becoming larger and affecting more and more communities around the world, often resulting in large-scale devastation. Just this year, communities have experienced catastrophic wildfires in Greece, Maui, and Canada to name a few. While the underlying causes leading to such an increase are complex — including changing climate patterns, forest management practices, land use development policies and many more — it is clear that the advancement of technologies can help to address the new challenges. At Google Research, we’ve been investing in a number of climate adaptation efforts, including the application of machine learning (ML) to aid in wildfire prevention and provide information to…  ( 92 min )
    Grammar checking at Google Search scale
    Posted by Eric Malmi, Senior Research Scientist, and Jakub Adamek, Senior Software Engineer, Google, Bard Team Many people with questions about grammar turn to Google Search for guidance. While existing features, such as “Did you mean”, already handle simple typo corrections, more complex grammatical error correction (GEC) is beyond their scope. What makes the development of new Google Search features challenging is that they must have high precision and recall while outputting results quickly. The conventional approach to GEC is to treat it as a translation problem and use autoregressive Transformer models to decode the response token-by-token, conditioning on the previously generated tokens. However, although Transformer models have proven to be effective at GEC, they aren’t part…  ( 91 min )
  • Open

    We hate how "black box" neural nets are, we made a thingy in an attempt to demystify their "thinking."
    submitted by /u/DeltaStarStudiosPR [link] [comments]
    AI ‘breakthrough’: neural net has human-like ability to generalize language
    submitted by /u/nickb [link] [comments]
    [Long read] Deep dive into AutoGPT: A comprehensive and in-depth step-by-step guide to how it works
    https://airt.hashnode.dev/long-read-deep-dive-into-autogpt-a-comprehensive-and-in-depth-step-by-step-guide-to-how-it-works submitted by /u/Harish_Mohanraj [link] [comments]
  • Open

    Trust in AI, Data Poisoning, and Involving People in Maturing AI
    submitted by /u/fookingyeah [link] [comments]
    Baby AGI and AgentGPT : Exploring Autonomous AI-Agents
    submitted by /u/Tao_Dragon [link] [comments]
    How can i use AI to research for my thesis?
    hey all imnewto this can you help me please ? submitted by /u/proptuxiakoskariolis [link] [comments]
    When I use AI to generate Halloween candy wrappers and then print them out...
    submitted by /u/Sea_Permit5660 [link] [comments]
    DTIYS Challenge Submission Sample Art for Oh my Anne
    submitted by /u/Oh_my_Winnie [link] [comments]
    One-Minute Daily AI News 10/24/2023
    OpenAI Executives Sam Altman Say AI Will Be Able to Do Any Job Within 10 Years.[1] Snapdragon 8 Gen 3 chipset officially announced with AI-driven functionalities.[2] Google parent Alphabet reported its third quarter earnings Tuesday, which showed more spending on AI infrastructure and muted cloud growth, culminating into several questions for executives about how all the efforts around artificial intelligence are actually going to turn into real money.[3] Adult film star Riley Reid(I don’t know who she is) launches Clona.AI, a sexting chatbot platform.[4] Sources: [1] https://www.wsj.com/podcasts/the-journal/a-conversation-with-openais-sam-altman-and-mira-murati/7c89e85f-9d7e-4569-b67d-6a777374eada [2] https://headtopics.com/my/snapdragon-8-gen-3-chipset-officially-announced-with-47616340 [3] https://www.nbcdfw.com/news/business/money-report/wall-street-wants-to-know-how-googles-going-to-profit-from-ai/3368989/ [4] https://www.engadget.com/adult-film-star-riley-reid-launches-clonaai-a-sexting-chatbot-platform-000509221.html submitted by /u/Excellent-Target-847 [link] [comments]
    Would majoring in artificial intelligence be worth it?
    The AI boom has made it more relevant than ever, and its applications are truly awe-inspiring. While it’s far from perfect, it has helped me greatly in writing, by generating content to inspire me and my projects. I have a smattering of skills, none that I’d consider especially good enough to double down upon, but learning how to optimize language learning models to produce the most adequate results would be pretty neat. I just don’t know what I want to do with my education, I’ve completed my basics and as such have a blank slate to play with, but I’m worried that whatever I select, it will be no good, and just result in lost time and money. Tertiary education seems like a necessity in the modern world, especially since the job world is more ruthless than ever, and the economy is in ashes. submitted by /u/Niobium_Sage [link] [comments]
    Need to find an Ai
    Which AI does these cartoon? submitted by /u/hommedufuture [link] [comments]
    I've been playing around with Midjourney a little bit and this is what I got.
    ​ https://preview.redd.it/2emqr4z8a9wb1.png?width=928&format=png&auto=webp&s=437547e7e86e23298b7c778cada9863385ce961d PROMT close up of eye, close up of girl eye, mangekyo sharingan, super close up, pretty eye, black and red eye, naruto anime, long eyelashes, anime eye, 2d art eye, --s 180 --style expressive ​ https://preview.redd.it/4dffpg2ca9wb1.png?width=928&format=png&auto=webp&s=fc485a604460f7e544d0490d0bee65f984d8a5b3 PROMT **stained glass, it was meticulously written, picture with elaborate writing, cute girl smile with Rabbit,Flower, bold and strong line drawing, vivid acrylic painting, vivid thick paint, vivid, plain background, beautiful proof, highest resolution 16K, beautiful anime girl that is betrayeded by a Rabbit, hair is short, ferret, Beautiful lightcyan high ligh…
  • Open

    Detection and high-frequency monitoring of methane emission point sources using Amazon SageMaker geospatial capabilities
    Methane (CH4) is a major anthropogenic greenhouse gas that‘s a by-product of oil and gas extraction, coal mining, large-scale animal farming, and waste disposal, among other sources. The global warming potential of CH4 is 86 times that of CO2 and the Intergovernmental Panel on Climate Change (IPCC) estimates that methane is responsible for 30 percent of observed […]  ( 12 min )
  • Open

    Research Focus: Week of October 23, 2023
    In this issue: Kosmos-2.5: A Multimodal Literate Model; Can vine copulas explain complex relationships of weather variables; New system accelerates the adaptive training process; Structural inequalities and relational labor in the influencer industry. The post Research Focus: Week of October 23, 2023 appeared first on Microsoft Research.  ( 10 min )
  • Open

    Next-Gen Neural Networks: NVIDIA Research Announces Array of AI Advancements at NeurIPS
    NVIDIA researchers are collaborating with academic centers worldwide to advance generative AI, robotics and the natural sciences — and more than a dozen of these projects will be shared at NeurIPS, one of the world’s top AI conferences. Set for Dec. 10-16 in New Orleans, NeurIPS brings together experts in generative AI, machine learning, computer Read article >  ( 8 min )
  • Open

    The right to perform RL on games
    Hi all, I'm new to learning RL. I want to train an agent to clear a game such as vampire survivor, super mario brothers, etc, as my first research/project. I talked with my tutor , he reminded me to pay attention to copyright issues and that I needed a permission to use these works for training. I guess I could get permission by asking the game's author directly, but before that, or for games produced by some big companies, where can I find information about the rights? Although reading the game's memory is a challenge for me, it's cool to see a agent clear a game. submitted by /u/Ruine_fff [link] [comments]
    Building Doom with AI enemies
    I'm planning to go down the rabbit hole of using RL to train agents in doom/vizdoom The goal would be to create a version of doom where the enemies have AI and are adaptive. Doom and Doom 2 are some of my all time classic favorites. There are people still making maps to this day! Let me know on what you think about the idea? Project plan - Nov 2023 : RL refresher from the David Silver RL course on YouTube Dec 2023 : start working on openAI and stablebaselines3 and watch Nicholas Renotte's videos Jan 2024 : play around with the Doom WAD and try to see if you can make changes to the doom engine + Training and setting up custom env Feb 2024 : hopefully first level with enemy AI created Mar 2024 : release fully completed open source version of the game Background: I work at a hedge fund, have some basics on reimbursement learning, although it has been a long long time. Time is a bit limited after 12 hours or work and 2 hours of gym (the real human world one) so kinda stretching this out Any suggestions are welcome. Any courses, books, libraries and tools you'd suggest? submitted by /u/Sahil231090 [link] [comments]
    "Surprise" for learning?
    I was recently listening to a TalkRL podcast where Danijar Hafner explains that Minecraft as a learning environment is hard because of sparse rewards (30k steps before finding a diamond). Coincidentally, I was reading a collection neuroscience articles today where surprise or novel events are a major factor in learning and encoding memory. Does anyone know of RL algorithms that learn based on prediction error (i.e. "surprise") in addition to rewards? submitted by /u/CognitoIngeniarius [link] [comments]
  • Open

    Frontier Model Forum updates
    Together with Anthropic, Google, and Microsoft, we’re announcing the new Executive Director of the Frontier Model Forum and a new $10 million AI Safety Fund.  ( 4 min )

  • Open

    A warning about an unknown danger of AI. Current uses of AI have been overwhelmingly positive but there is an unknown danger that I would like to speak to.
    I want to warn AI companies and developers about a danger that is not known about regarding AI. The reason it is not known about regarding AI is that it isn't known about in general and so the AI community can hardly be blamed for that. Unfortunately, the danger here has to do with the fundamental nature of human society and social interaction as it stands at this time. The issue is that there is 'hidden language' used in social communication and unlike typical conceptions of things like body language this is not auxiliary to our rational purposes, rather our rational purposes are auxiliary to the hidden communication. One way of describing it would be that our formal language is a 'carrier wave' to encode other information about our status and the status of others. So our communications …
    AI Psychology Test: What happens in viewers' mind when news segments about important major events shift to commercials where the announcer is talking like a comic character?
    When news segments covering major, often serious, events abruptly switch to lighthearted or comical commercials, a cognitive dissonance can occur in the viewer. Here's why: news programs are designed to engage the viewer's analytical faculties. They present facts, figures, and expert opinions, demanding cognitive effort to understand the implications. The viewer is in a "serious" mode, applying critical thinking to absorb the information. Commercials, particularly the comic ones, often aim for emotional engagement rather than intellectual analysis. They use humor, catchy jingles, and attractive visuals to create a positive association with the product being advertised. When the transition between these two contrasting tones is sudden, the viewer has to perform a rapid mental shift from analytical to emotional engagement. This can be jarring. This dissonance can have a few different outcomes. For one, it might diminish the impact of both the news segment and the commercial. The viewer might find it difficult to fully engage with either, as the cognitive "gear shifting" can be distracting. Secondly, this dissonance can potentially undermine the gravitas of the news. When sandwiched between comic commercials, serious topics might lose some of their perceived importance. Lastly, it can make the commercial less effective. The viewer, still in a serious mindset, may not be as receptive to the emotional triggers that the commercial aims to pull. So, in essence, this rapid shift can dilute the efficacy and impact of both the news and the advertising, while causing cognitive friction for the viewer. CGPT-4 submitted by /u/Georgeo57 [link] [comments]
    Any good AI-integrated video games?
    Does anybody know of any good AI integrated games that have been released or are in beta? I'm really interested to see how people have incorporated the current boom in AI into game design. submitted by /u/Rfallmann [link] [comments]
    Managing AI Risks in an Era of Rapid Progress
    The rapid progress of AI development brings both opportunities and risks. While AI systems have the potential to cure diseases and elevate living standards, they also pose large-scale risks that we are not prepared to handle. Without proper safety measures and ethical considerations, advanced AI systems could amplify social injustice, erode social stability, and enable criminal activities. The development of highly advanced autonomous AI systems also raises concerns about the pursuit of undesirable goals and the loss of human control. To ensure a positive outcome, research breakthroughs in AI safety and ethics are needed, along with effective government oversight. Source : https://managing-ai-risks.com/ submitted by /u/NuseAI [link] [comments]
    Deepfakes Just Got Very Real
    Interesting read about deepfakes that started with a Reddit post. https://www.linkedin.com/pulse/deepfakes-just-got-very-real-scott-clark-sfurc submitted by /u/scottimherenowwhat [link] [comments]
    How AI could change Google search and wipe out $68 billion SEO industry | Fortune
    Oh well 🤷‍♂️ submitted by /u/AminoOxi [link] [comments]
    🦾ERNIE 4.0 vs GPT-4, Tightened AI Chip Restrictions, Alibaba Tencent Fund AI Startup, and China's Global AI Governance Initiative
    submitted by /u/trcytony [link] [comments]
    Stanford AI Conference - New Horizons in Generative AI: Science, Creativity, and Society - Livestreaming Now
    submitted by /u/Nice-Inflation-1207 [link] [comments]
    Dancing with Light: A Hummingbird's Enchanted Encounter.
    submitted by /u/IllustriousVideo6145 [link] [comments]
    150+ Awesome ''Act As'' ChatGPT Prompts
    submitted by /u/Senior_tasteey [link] [comments]
    ChatGPT, invent comics for robots.
    submitted by /u/Philipp [link] [comments]
    An A.I. video interpretation of "Metamorphosis Two" by Philip Glass
    submitted by /u/AnimalsChasingCars [link] [comments]
    I have a question
    What’s the best voice ai for song covers? Like I wanna do someone like Donald Trump, Cartman, Ice King/Simon singing The Boys (Eng Ver) by SNSD. Also it has to be free! submitted by /u/Ok-Upstairs-9887 [link] [comments]
    Apple and AI
    Apple has been behind in the AI field compared to companies like OpenAI, Google, Microsoft, and Amazon. While Apple has made improvements in autocorrect and AI features in Photos, it needs to catch up to remain competitive. Apple executives have been scrambling to make up for lost time and have been working on generative AI technology. There is anxiety within Apple about whether their AI/ML team can deliver. Source : https://daringfireball.net/2023/10/apple_and_ai submitted by /u/NuseAI [link] [comments]
    🚀 Gaming with ChatGPT using Encrypted Prompts and Prompt Injection! 🎮
    submitted by /u/Gloomy_Recognition_4 [link] [comments]
    How are neobanks utilizing AI to offer more accurate and personalized financial advice to customers?
    Your answers are appreciated. submitted by /u/Cygnet-Digital [link] [comments]
    One-Minute Daily AI News 10/23/2023
    The U.S. Senate will hold the second in a series of bipartisan AI Insight Forums on Tuesday, Oct. 24, where senators will hear from some of the most influential tech leaders to help inform regulations around the technology.[1] Microsoft announces A$5 billion investment in computing capacity and capability to help Australia seize the AI era.[2] Samsung is going all in with the AI performance of the Galaxy S24 phones.[3] Reddit has reportedly decided to block AI startups from scraping data from its website. This move prevents third-party companies from using Reddit’s data to train their machine-learning models without permission.[4] Sources: [1] https://news.asu.edu/20231020-government-calling-tech-leaders-help-crafting-artificial-intelligence-legislation [2] https://news.microsoft.com/en-au/features/microsoft-announces-a5-billion-investment-in-computing-capacity-and-capability-to-help-australia-seize-the-ai-era/ [3] https://www.androidheadlines.com/2023/10/samsung-galaxy-s24-smartest-ai-phone.html [4] https://www.androidheadlines.com/2023/10/reddit-block-ai-startups-scraping-data.html submitted by /u/Excellent-Target-847 [link] [comments]
    オレの攻撃からお前は逃れられぬ。 いかなる人間も、死という現実から決して逃れられぬように。 受け入れることだ。定めよ。
    submitted by /u/nicdunz [link] [comments]
    Anti deepfake headset V2
    You can find out more here in the comments submitted by /u/ahauss [link] [comments]
  • Open

    [D] How should I calculate the weights for a multi-label classification task where the labels are dependent among one another?
    I'm not sure if I worded the title correctly. Let me elaborate on the scenario. I have a multi-label image classification task where I'm trying to classify the gender of clothing images. The two labels that we can predict are Male and Female, hence the final logit vector's size would be something like [batch_size, 2]. Depending on the predictions, we're mapping the following binary values to different categorical values: [0, 0]: Unknown [0, 1]: Male [1, 0]: Female [1, 1]: Unisex The overall distribution is heavily imbalanced with Male being the minority class. I'm trying to calculate class weights to favor Male, but the problem is that the size of the weight tensors to be provided to the loss function should have a length of 2. I say this is a problem because although the number of prediction logits is 2, the actual number of classes is 4. I used the word "dependent" in my title because, for example, [1, 1] wouldn't necessarily mean that the image has the labels Male and Female, rather that it's a completely new Unisex label. Again, not sure if the usage of the word is appropriate. Anyway I've thought of making a custom loss function to first map the binary labels to their respective categorical values, but am wondering if there is any other way to go about this. submitted by /u/Seankala [link] [comments]  ( 9 min )
    [D] LSTM: Train & Val losses not converging
    I am training an LSTM model for path prediction where I'm feeding in OBT (on-board Time) and X matrix as input and Y matrix is the predecessor matrix generated using Scipy.Dijkstra ​ This is the model architecture for reference, This is the model architecture for reference, I've tried multiple iterations of this similar model, but the training and validation loss, are not converging. The best train_loss i've been able to achieve is 88k mse and 400 mse val_loss I've uploaded the dataset here: GitHub - mathur-exe/LSTM_Dataset Training Progress: Epoch 1/100 342/342 - 17s - loss: 22606898.0000 - val_loss: 61414736.0000 - 17s/epoch - 49ms/step Epoch 2/100 342/342 - 14s - loss: 7990657.0000 - val_loss: 3699703.5000 - 14s/epoch - 40ms/step Epoch 3/100 342/342 - 13s - loss: 4130298.7500 - val_loss: 136808.1094 - 13s/epoch - 38ms/step Epoch 4/100 342/342 - 12s - loss: 2747299.2500 - val_loss: 35710.1680 - 12s/epoch - 35ms/step Epoch 5/100 342/342 - 12s - loss: 2558378.2500 - val_loss: 3383.4780 - 12s/epoch - 36ms/step Epoch 6/100 342/342 - 13s - loss: 1214455.8750 - val_loss: 111625.2891 - 13s/epoch - 37ms/step Epoch 7/100 342/342 - 19s - loss: 337480.2500 - val_loss: 68686.6094 - 19s/epoch - 55ms/step Epoch 8/100 342/342 - 15s - loss: 316366.7188 - val_loss: 2059.3557 - 15s/epoch - 44ms/step Epoch 9/100 342/342 - 20s - loss: 293117.0312 - val_loss: 20961.5469 - 20s/epoch - 58ms/step Epoch 10/100 342/342 - 17s - loss: 575945.1875 - val_loss: 503602.8438 - 17s/epoch - 50ms/step Epoch 11/100 342/342 - 13s - loss: 290962.8750 - val_loss: 62491.9375 - 13s/epoch - 37ms/step Epoch 12/100 342/342 - 12s - loss: 1125042.5000 - val_loss: 36054.6836 - 12s/epoch - 36ms/step Epoch 13/100 ... 342/342 - 16s - loss: 230900.7969 - val_loss: 48309.6094 - 16s/epoch - 47ms/step Epoch 93/100 342/342 - 23s - loss: 232846.6094 - val_loss: 82926.6875 - 23s/epoch - 67ms/step submitted by /u/Gaurang_Mathur_ftw [link] [comments]  ( 9 min )
    [D] Will ChatGPT remove the need for data annotation?
    I wrote a blog post about this detailing my experience, which I will attach at the bottom but I want to hear opinions of people. It is something I've actively been thinking about, and would like to know potential pitfalls and why it may not work, rather than the huge promise it holds. https://ozanciga.wordpress.com/2023/10/24/will-chatgpt-remove-the-need-for-data-annotation/ submitted by /u/ozanciga [link] [comments]  ( 9 min )
    [R] Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture
    submitted by /u/hzj5790 [link] [comments]  ( 9 min )
    [D] Is it better to create a different set of Doc2Vec embeddings for each group in my dataset, rather than generating embeddings for the entire dataset?
    I'm using Top2Vec with Doc2Vec embeddings to find topics in a dataset of ~4000 social media posts. This dataset has three groups: Posts from a company (3%) Posts from this company's potential customers (82.5%) Posts from this company's competitors (14%) The purpose of this analysis is to look at the topics the company is posting about on social media and see how it compares to the things that their customers and competitors are posting about. Since the values of Doc2Vec embeddings depend on the other documents in the dataset, I'm worried that topics in smaller groups are going to be drowned out by the larger group. I'm worried that the differences between the document vectors in the smaller group are going to be made smaller by presence of the documents from the larger group, which may represent a much wider array of different topics. submitted by /u/abelEngineer [link] [comments]  ( 9 min )
    [D] How would you do it? Handling multi-turn QA conversation with matching of questions to vector database.
    I have been giving this some thought and would appreciate some outside input, maybe someone has some experience they could share! I am attempting to create a QA chatbot that is limited to answering questions from a pre-determined set of question and answer pairs I have in a vector database. Currently I create embeddings of the question using OpenAI and query a vector database for similar "reference question" - if the similarity score is high enough I proceed and use the answer text I have stored in the metdata as "context" for the answer generation. I would now like to extend this to include conversational history. The issue I am facing however, is that a follow-on question may not hit the similarity threshold. Considering a follow-up question would typically not be worded in a way that …  ( 10 min )
    [P] The ML Practitioner, a publication about all things machine learning and MLOps
    Hi all, my wife and I have recently started a new publication called The ML Practioner. If you're interested in writing for us, please send us a link of your unpublished draft here. Either way, please subscribe to us if you're interested in this kind of content! submitted by /u/kanxx030 [link] [comments]  ( 9 min )
    [D] efficacy of cold start preferences on recs systems
    Hi all, Are there good papers about the efficacy of cold start explicit preference collection (think Netflix “pick some movies you like”) on the recs systems? I haven’t been able to find any so far. One key aspect I’m looking for is if these are effective, how long they are relative to just implicit actions the user takes. Thanks submitted by /u/steathilynecessary [link] [comments]  ( 9 min )
    [D] Embedding models ranked by encode speed?
    Hello, the sbert.net has a list where you can sort by encode speed but its a very small subset of the HuggingFace MTEB leaderboard. AFAICT, the HuggingFace leaderboard / model pages don't have this information. Is there a list where I can see a more up-to-date list of models by encoding speed? submitted by /u/rsamrat [link] [comments]  ( 9 min )
    [D] Finite State Transducers and language productivity
    In the context of NLP, will language models based on finite state transducers (since they are finite) ultimately fail to put language's productive nature to good use? All the possible outputs a finite state transducer can produce are predictable, while all the possible outputs a given natural language can produce are much less predictable? submitted by /u/RecordingOk5720 [link] [comments]  ( 9 min )
    [P] Equinox KV Cache
    I've been trying to implement a kv cache in my language model but have been unsuccessful so far due to the dynamic shapes. I've seen some implementations in flax but was wondering if it was possible to implement in equinox as that's what I'm using and prefer over others like flax. If anyone can point me in the right direction or help with the implementation that would be great! PS: I can provide any code if wanted to help submitted by /u/Additional-Ad-7043 [link] [comments]  ( 9 min )
    Explainable Boosting Machine Local and Global Explanation plots label size [D]
    I am using EBM for a research, the local and global explanation plots it produces come with preset font size, I want to change the resolution of the figure and the font size of labels and x and y ticks in the explanation plots. I have looked for it on the InterpretML github page and issues and scrolled through various webpages but haven't found anything helpful. Used gpt but it doesnot help either, it tries to use matplotlib but EBM plots are not compatible with it. Please share any way it can be solved, because the plots labels are unreadable in the article if used as it is. submitted by /u/Horseman099 [link] [comments]  ( 9 min )
    [D][P] What is the metric for early stopping in YOLOv8 detection?
    I am trying to fine tune the yolov8 detection model an was going through the code base of ultralytics.I found this piece of code in the engine.trainer # Early Stopping if RANK != -1: # if DDP training broadcast_list = [self.stop if RANK == 0 else None] dist.broadcast_object_list(broadcast_list, 0) # broadcast 'stop' to all ranks if RANK != 0: self.stop = broadcast_list[0] if self.stop: break # must break all DDP ranks I'm familiar with how the early stopping works and not sure what they are doing here does this get invoked by default?? what is the metric that they use in order to stop it?? upon further inspection i found this self.stopper, self.stop = EarlyStopping(patience=self.args.patience), False which is imported as from ultralytics.utils.torch_utils import (EarlyStopping, ModelEMA, de_parallel, init_seeds, one_cycle, select_device, strip_optimizer) please help me find out what metric they use to stop this and if the earlystopping is invoked by default submitted by /u/rakk109 [link] [comments]  ( 9 min )
    [P] The N Implementation Details of RLHF with PPO
    We are happy to share a great repro of OpenAI's early RLHF codebase, with nearly identical learning curves. We also summarized implementation details (did you know Adam Optim's implementation details could impact RLHF?) 📜 Blog post:https://huggingface.co/blog/the_n_implementation_details_of_rlhf_with_ppo 💾 Code: https://github.com/vwxyzjn/lm-human-preference-details submitted by /u/vwxyzjn [link] [comments]  ( 9 min )
    ML [project] [p]
    What are best ways to collect database for any ml project submitted by /u/GingSkywalker [link] [comments]  ( 8 min )
    [R] Feature Space Reduction Method for Ultrahigh-Dimensional, Multiclass Data: RFMS
    We are excited to announce the publication of our groundbreaking scientific paper in Machine Learning: Science and Technology titled “Feature Space Reduction Method for Ultrahigh-Dimensional, Multiclass Data: Random Forest-Based Multiround Screening (RFMS)” by Gergely Hanczar, Marcell Stippinger, David Hanak, Marcell T Kurbucz, Oliver M Torteli, Agnes Chripko, and Zoltan Somogyvari. Published on: 19 October 2023 DOI: 10.1088/2632-2153/ad020e Volume 4, Number 4 In recent years, several screening methods have been published for ultrahigh-dimensional data that contain hundreds of thousands of features, many of which are irrelevant or redundant. However, most of these methods cannot handle data with thousands of classes. Prediction models built to authenticate users based on multichannel biometric data result in this type of problem. In this study, we present a novel method known as random forest-based multiround screening (RFMS) that can be effectively applied under such circumstances. The proposed algorithm divides the feature space into small subsets and executes a series of partial model builds. These partial models are used to implement tournament-based sorting and the selection of features based on their importance. This algorithm successfully filters irrelevant features and discovers binary and higher-order feature interactions. To benchmark RFMS, a synthetic biometric feature space generator known as BiometricBlender is employed. Based on the results, the RFMS is on par with industry-standard feature screening methods while possessing many advantages. r/IAMA - Oct 26 with the founders of Cursor Insight. https://bit.ly/AMAwithCursorInsight-GoogleCalendar ​ R/IAMA - Oct 26 with the founders of Cursor Insight. submitted by /u/CursorInsight [link] [comments]  ( 9 min )
    [N] New letter from Yoshua Bengio, Geoffrey Hinton, and others: Managing AI Risks in an Era of Rapid Progress
    Signatories include Turing Award winners Yoshua Bengio, Geoffrey Hinton, as well as others academics and experts. In 2019, GPT-2 could not reliably count to ten. Only four years later, deep learning systems can write software, generate photorealistic scenes on demand, advise on intellectual topics, and combine language and image processing to steer robots. As AI developers scale these systems, unforeseen abilities and behaviors emerge spontaneously without explicit programming1. Progress in AI has been swift and, to many, surprising. The pace of progress may surprise us again. Current deep learning systems still lack important capabilities and we do not know how long it will take to develop them. However, companies are engaged in a race to create generalist AI systems that match or ex…  ( 10 min )
    [D] Generative Food
    Hey guys, I sometimes post about tiny ML projects we work on. This time, we talk about applying language models for generating recipe titles/ideas. Specifically, we don't use LLMs, and this turns out to be a bit of a controversial decision, but one that has it's own advantages. Quite interested in the community's take on it: https://engineering.hellofresh.com/recipes-and-generative-ai-6d74a107860c submitted by /u/abnormdist [link] [comments]  ( 9 min )
    [R] Tokenizer Choice For LLM Training: Negligible or Crucial?
    📷Research https://arxiv.org/abs/2310.08754 While the recent success of LLMs has been driven primarily by curation of training dataset composition, scaling of model architectures and dataset sizes, and advances in pretraining features, the impact of tokenizers has often lagged as a blind spot. Our researcher*s study sheds light on this issue and shows that tokenizer choice can significantly impact downstream model performance as well as training and inference costs. 1️⃣ Investigation of intrinsic tokenizer performance, i.e., study of tokenizer properties (i.e., generated vocabulary), and tokenization results of tokenizers. 2️⃣ Investigate the extrinsic performance of the tokenizer, i.e., the impact of the tokenizer on the downstream performance of the model. 3️⃣ Investigation of possible correlation between intrinsic and extrinsic tokenizer performance. ​ 💡 The investigation shows that the common tokenizer evaluation metrics "fertility" and "parity" do not always predict the performance of the downstream model, making these metrics a questionable criterion for tokenizer evaluation. 💡 Moreover, the study shows that multilingual tokenizers - which are based on the five most common European languages - require a vocabulary size by a factor of three compared to English. The previous approach of training tokenizers with English vocabulary only thus turns out to be inefficient and results in a strong performance degradation and additional training costs of up to 68% submitted by /u/effi28_ml [link] [comments]  ( 9 min )
    [R] Using Machine Learning to Drive Portfolio Asset Allocations
    I'd love to hear your guys thoughts on next steps to improve this, maybe deeper layers and more nodes, maybe a random forest is more appropriate? I'd love to hear any thoughts on Machine Learning directly applicable to time-series data. https://www.quantitativefinancialadvisory.com/post/asset-allocation-in-a-post-modern-portfolio-theory-world-part-1-the-single-layer-taarp-ml-model The Main Idea We will develop a Machine Learning model, specifically a deep learning model (more hidden layers to come), to periodically, tactically rebalance the weights of our portfolio based on observable market data and empirically determined statistics combined with feature engineering from the past 21 trading days, and for the VIX we consider its characteristics since inception. The output will be a range representing the degree to which we bet long, short, or hold cash, and 3 weights that sum to less than or equal to one and greater than or equal to negative one. In essence we will allow shorting of securities and not require our portfolio to be fully invested. Cash is an active position; sometimes the best investment is staying on the sidelines. The model will allow one input layer, one and two hidden layers (to show that more might not always be better, explicitly with the 200 variable maximum excel solver imposes on us), and an output layer with 3 nodes outputting a value between -1 and +1 with -1 representing a full allocation to a short position in the security and +1 representing a fully allocated long position. submitted by /u/QFA_official [link] [comments]  ( 9 min )
    [D] Are people in ML Phds still happy?
    As an outsider who has many friends in ML Phds, this is my perspective of their lives: long hours, working nights, weekends no work-life balance, constant fear of being scooped and time pressure from deadlines frustrating broken review systems many incremental, advertisement papers that produce very little actual contribution (which is justified by 2.) "engineering" and not "science" all this pressure amounts to severe imposter syndrome Are people in the field still happy? Where do people get their satisfaction? To me it looks like almost like a religion or a cult. The select few who say, get neurips outstanding paper are promoted to stardom - almost a celebrity status while everyone else suffers a punishing work cycle. Are the phd students all banking on AGI? What else motivates them? Edit: the discussion is about whether 1-6 are worse in ML than other fields (or even the median experience). The reference for "other field" is highly heterogenous. Experience obviously varies by lab, and then even by individuals within labs. "It happens in other fields too" is a trivial statement - of course some version of 1-6 affects somebody in another field. Edit 2: small n but summarizing the comments - experience seems to differ based on geographic region, one's expectations for the phd, ability to exert work-life balance, and to some extent ignore the trends others are all following. Some people have resonated with problems 1-6, yet others have presented their own, anecdotal solutions. I recommend reading comments from those who claim to have solutions. submitted by /u/shenkev [link] [comments]  ( 9 min )
    [P] A PDF tool that supports three retrieval strategies, allowing users to choose the answer that suits them best
    ➡️ Check on https://huggingface.co/spaces/xuyingliKepler/VecDBCompare 📌 Introduction: VecDBCompare is a streamlit-based application designed to evaluate and compare three different vector database retrieval strategies. Users only need to upload a PDF and interact with QABots using three different strategies to determine which strategy is most suitable for them. ⭐️ Three retrieval strategies: Chunk Strategy: Divides the document into small chunks and retrieves based on the most relevant chunks. Summary Strategy: Summarizes the document and retrieves based on the summary content. Hypothetical Question Strategy: Generates hypothetical questions that the document might answer and retrieves based on these questions. submitted by /u/xuying_li [link] [comments]
    [D] [P] 3D Design file labelling and classification for manufacturing
    I have ~1 million 3D design (.STP and/or .OBJ) files of various parts for medical devices, aerospace, automotive or defense systems. I'd like to label them based on appropriate manufacturing methods that are used to physically make them. Some example methods and labels would be milling, turning, injection molding, cnc machining, etc. After labelling, I'd like to architect a system to produce these labels as inference for a new part that has not been physically made yet. My team (<5 people) have manufacturing domain expertise and can manually label these parts but I'm looking for a more scalable solution that isn't as time consuming. Crowd sourced methods like Mechanical Turk won't work because annotators do not have the domain knowledge to mark the correct label. Labelling platforms like SageMaker/Azure ML Studio only allow image/text/audio datasets, is there a platform that'll help me setup labelling tasks for 3D designs? Furthermore, how can I find more experts that can help scale this up? It seems to me that the only option is to build my own labelling app as an annotator needs these key features - 3D model visualizer so they can spin the part and view any orientation Draw a bounding box (commonly available in other platforms) Toggle measurements in inches/mm As for label classification I'm looking at architectures like PointNet since my dataset of meshes can be sampled to point clouds. Are there other methods that would work better or worth exploring? Open to any and all suggestions across this pipeline. ​ ​ submitted by /u/rootcage [link] [comments]  ( 9 min )
    [D] Undergrad seeking advice on ethics/ML research
    I’m an undergraduate who’s considering a PhD student in ML. I’m currently in a lab that focuses on ethics in AI. While I love the work, it focuses on the humanities side of CS. I’ve always been a more mathy person and have always been interested in theoretical ML research. I’d like to combine ethics & AI/ML in some way (eg studying explainable AI from the technical perspective). I was wondering what are some research areas that combine the two and if I don’t work in academia, what’s the market and job prospects like for someone who does this? submitted by /u/SnooChipmunks1902 [link] [comments]  ( 9 min )
  • Open

    Rewards in Montezuma's Revenge
    Hello all, I'm working on Montezuma's Revenge using the Gymnasium API. I wonder if there's anyone here that knows the numerical value of the rewards? And if so, how they are typically scaled down. ​ Thanks! ​ G_bes submitted by /u/G_bes [link] [comments]
    The N Implementation Details of RLHF with PPO
    submitted by /u/vwxyzjn [link] [comments]
    Creating a Custom Environment in Unreal Engine 5
    Hello, I would like to create my own environment (Maze), in which I would like to train my drone using reinforcement learning, I am kind of new and I don't know how can I set the state space, rewards, and if I would like to use BS3 for training then how can I connect the environment? And for the agent which is the drone, should i just do the AirSim build.cmd and take the agent from there and place the starting position flag or what? I am a bit lost and I can't find tutorials on how to do this, I'd appreciate it if you could provide some guidance. Thanks in advance. submitted by /u/Gabii99 [link] [comments]
  • Open

    Intelligent document processing with Amazon Textract, Amazon Bedrock, and LangChain
    In today’s information age, the vast volumes of data housed in countless documents present both a challenge and an opportunity for businesses. Traditional document processing methods often fall short in efficiency and accuracy, leaving room for innovation, cost-efficiency, and optimizations. Document processing has witnessed significant advancements with the advent of Intelligent Document Processing (IDP). With […]  ( 20 min )
    T-Mobile US, Inc. uses artificial intelligence through Amazon Transcribe and Amazon Translate to deliver voicemail in the language of their customers’ choice
    This post is co-authored by Dhurjati Brahma, Senior Systems Architect at T-Mobile US, Inc and Jim Chao, Principal Engineer/Architect at T-Mobile US, Inc and Nicholas Zellerhoff Associate Systems Architect at T-Mobile US, Inc. T-Mobile US, Inc. provides a Voicemail to Text service to its customers, which allows customers to quickly read through their voicemails and […]  ( 7 min )
  • Open

    DSC Weekly 24 October 2023
    Announcements Top Stories In-Depth The post DSC Weekly 24 October 2023 appeared first on Data Science Central.  ( 21 min )
    Seamless integration of data from unconventional source systems into Business Intelligence using data science techniques
    Written by Venkata Nori and Kshitij Gopali. Introduction As technology is evolving, most companies in the world are adopting advanced mechanisms for their daily tasks of storing/updating data, project management & tracking, incident management, version control, etc. Periodically, these companies’ business stakeholders would want to extract and analyze the data to see how the business… Read More »Seamless integration of data from unconventional source systems into Business Intelligence using data science techniques The post Seamless integration of data from unconventional source systems into Business Intelligence using data science techniques appeared first on Data Science Central.  ( 25 min )
    How data science and medical device cybersecurity cross paths to protect patients and enhance healthcare
    A recent interview by Medical Device Network with GlobalData medical analyst Alexandra Murdoch shares interesting insights into cybersecurity for medical devices. The post How data science and medical device cybersecurity cross paths to protect patients and enhance healthcare appeared first on Data Science Central.  ( 22 min )
    Skills required to excel in a business analytics career
    In the contemporary business landscape, where data is heralded as the new oil, Business Analytics has emerged as a pivotal domain, steering organizations towards informed decision-making and strategic planning. business analytics encompasses the utilization of data, statistical algorithms, and machine learning techniques to comprehend the business context, forecast future trends, and facilitate optimal decision-making. The… Read More »Skills required to excel in a business analytics career The post Skills required to excel in a business analytics career appeared first on Data Science Central.  ( 22 min )
    GenAI: The game-changer in data analytics
    In an era where data drives decisions, GenAI emerges as a prodigy force in the realm of data analytics. According to Statista, LLM’s market size is expected to show an annual growth rate of 24%, resulting in a market volume of $207 bn by the end of 2030.  This cutting-edge technology, built on sophisticated algorithms… Read More »GenAI: The game-changer in data analytics The post GenAI: The game-changer in data analytics appeared first on Data Science Central.  ( 22 min )
  • Open

    Animated AI
    submitted by /u/nickb [link] [comments]
  • Open

    Best of N series
    A couple days ago I wrote about the likelihood of the better team winning a best-of-five or best-of-seven series. That is, if the probability of X winning a game against Y is p > ½, how likely is it that X will win a majority of 5 games or a majority of 7 games. This […] Best of N series first appeared on John D. Cook.  ( 6 min )
    Lessons from Skylab
    I discovered the Space Rocket History Podcast a while back and listened to all the episodes on the Apollo program. I’m now listening to the episodes on Skylab as they come out. I came for Apollo; I stayed for Skylab. I would not have sought out the episodes on Skylab, and that would have been […] Lessons from Skylab first appeared on John D. Cook.  ( 6 min )
    Curvature: principal, Gauss, and mean
    This post will compute the center of curvature for an object described in the previous post. In order to do that we first need to describe principle curvature and Gauss curvature, and we’ll throw in mean curvature while we’re at it. Let S be a surface sitting in three dimensional space. No need for more […] Curvature: principal, Gauss, and mean first appeared on John D. Cook.  ( 6 min )
    An algebraic superegg
    One year ago I wrote about a variant of the squircle that is quantitatively close to the customary definition but that has nicer algebraic properties. That post used the term p-squircle for the usual squircle with equation where p > 2, and the term s-squircle for the variation with equation where s is between 0 […] An algebraic superegg first appeared on John D. Cook.  ( 5 min )
  • Open

    On Razer’s Edge: VFX Star Surfaced Studio Creates Stunning Sci-Fi World This Week ‘In The NVIDIA Studio’
    Visual effects artist Surfaced Studio returns to 'In the NVIDIA Studio' to share his real-world VFX project, created on a brand new Razer Blade 16 Mercury Edition laptop powered by GeForce RTX 4080 graphics.  ( 8 min )

  • Open

    [P] Traffic signs in ecognition developer
    Hi community, first time posting here. I'm working on a project for the segmentation and classification of traffic signs using eCognition Developer software. I need help with creating scripts to apply three classifiers: Naive Bayes, SVM, and Random Forest. I'd like to know how I can implement these classifiers in eCognition Developer and where to insert the scripts in the software. Does anyone have experience with this software and could share script examples or provide guidance on how to accomplish this task? Sorry English is not my first language. Tldr, i need to include the Bayes classifiers, Random Tree, and SVM in eCognition Developer (for segmentation and classification - prediction). submitted by /u/Dignai [link] [comments]  ( 9 min )
    [P] Using gpt4docstrings to generate docstrings for entire projects
    gpt4docstrings is a Python library that allows you to write docstrings for functions / classes non documented in your codebase. In this case, I'm applying the library to one module of langchain to see the results. Repo: https://github.com/MichaelisTrofficus/gpt4docstrings https://i.redd.it/78f3wit071wb1.gif submitted by /u/Hefty-Consequence443 [link] [comments]  ( 9 min )
    [P] DQN with a binary vector as output
    Heey everyone! I hope you're doing well. I need your help guys. I'm working on a DQN that outputs a binary vector of length L (I just applied sigmoid function on the ouptut layer and take p>0.5 as 1 and 0 otherwise). In this setting, at each decision time, the agent returns a list containing the indices of selected elements. Knowing that the list's length is dynamic how can I train my DQN ? (I am facing issues in this). Is there any alternative way to do this purpose (like DDPG :/ )? submitted by /u/GuavaAgreeable208 [link] [comments]  ( 9 min )
    [Project] Looking for AI/ML engineers to team up for a fallow deer identification project
    Hi, first of all, sorry for the cross post, but I guess Huggingface forums were not the right place to begin with and it took me a while to find out where things about AI/ML are being actively discussed. I am a professional software developer (C, Python on Linux) and while I did try out a few things with PyTorch and Diffusers - I am not an ML engineer, so I am looking for someone with ML expertise who’d be interested to team up for a non commercial open source project. I can do quite a lot around application development, but I clearly lack the required ML knowledge. I followed the free MIT ML courses on YouTube, did some reading, tried things out, but the ML part of this project is for sure over my head. So, here’s what I have in mind: I would like to create an application which would b…  ( 11 min )
    [D] Using SQL to monitor ML models
    Hello, We are running a number of machine learning models in production and would like to monitor some metrics during inference: Data quality, inference time, accuracy, etc. All these metrics could be recorded in the python code and we are planning to build a SQL database that will receive all the information so as we can visualize in grafana. Do you think this is a good pattern? What would you suggest instead (we are using AWS). Thank you in advance. submitted by /u/Eddas123 [link] [comments]  ( 9 min )
    [R][P] Trying to understand the generative properties of autoencoders
    A while back, I came across the "From Variational to Deterministic Autoencoders", which provided a novel insight into the generative properties of autoencoders by framing the objective through the lens of regularization. However, I couldn't help but notice that the deterministic models studied felt incomplete, namely due to the inherent lack of sampling in those models (which is something that the authors acknowledge). To provide a short recap of the paper, the authors surgically decompose the variational autoencoder objective into a deterministic one. They start with a Constant-Variance VAE, which is a special case of the general Gaussian latent VAE where the noise standard deviation of the latent distribution is fixed to 1. This leads to what is essentially a standard autoencoder with t…  ( 10 min )
    [D] What is the lowest possible loss for a language model?
    Example: Suppose a character-level language model (three input letters to predict the next one), trained on a dataset that contains three instances of the sequence aei, with two occurrences preceding o and one preceding u, i.e., the dataset is: Input Output aei o aei u aei o In this case, the ideal probability distribution for the model's logits for aei would be ~0.66 for o, ~0.33 for u, and zero for other letters. In other words, when the model is input with aei, the ideal softmax of the logits would be ~0.66 for o, ~0.33 for u, and zero for other letters. Following this reasoning, the objective is to optimize the model's output for a given input to match the distribution of occurrences in the dataset. If this reasoning is correct, then we have the following ideal loss (cross-entropy): https://preview.redd.it/pzpxogcqd0wb1.png?width=330&format=png&auto=webp&s=b0b6c3b5fbfb4797c11a1f26375065ce883551d3 Thus, ~0.63 is the smallest loss we can get with this dataset. Is my reasoning correct? submitted by /u/viniciusarruda [link] [comments]  ( 9 min )
    [D] Tanh activation function outputs the same value for any given input
    Basically im working on the DDPG algorithm in DRL where i have an actor and critic networks. The actor network architecture is quite simple: Input layer contains 22 neurons that represents the state values (ranging from 0.1 to 10.0 max not normalizing them) Two hidden layers with 128 neurons, with Leaky Relu activation (alpha = 0.01), and with HeUniform kernel initialzer Output layer with a single neuron has tanh activation, using Glorot kernel initialzer The critic network has the same architecture but we only concatenate the 22 state values with the action produced by the actor, the only difference is the ouput of the critic has no activation. And both networks use Adam. The problem arises when the training starts because i run a few steps without actually start the learning, but when the learning starts, the actor converges quickly to output values 1 or -1 afor any given input. I tried many learning rates for both actor and critic. One thing to note is when i set the actor learning rate to 1e-5 and the critic to 1e-3 the networks sometimes converges quickly, some time it takes longer to converge and sometimes it does not converge. submitted by /u/Desert_champion [link] [comments]
    [P] Fine-tuning VAEs on limited data
    I have been looking for a pre-trained VAE (on Imagenet with ResNet/VGG) or similar which I could fine-tune on my smaller dataset. However, not only there does not exist many such pre-trained weights but the practice of fine-tuning VAEs does not really seem mainstream. Is there a reason why VAEs are not pre-trained/fine-tuned? Does it have to do with posterior collapse? submitted by /u/unholy_sanchit [link] [comments]  ( 9 min )
    [D] Smart pooling for Visual Transformers
    There is an architecture for images/videos called MViT, where 2D MaxPooling layers are added to reduce computations for ViT. But MaxPooling has a drawback - it discards information independently of context, equally discarding information from both important and uninformative parts of the image. For traditional Conv2D networks, there's little we can do about this, but for transformers, we can reduce dimensionality in a more meaningful way - discarding only those elements that don't carry unique information. Are there any articles/developments on this topic already? submitted by /u/Dependent_Bluejay_45 [link] [comments]  ( 9 min )
    "[Research]" RVC AI Training
    Hello, I'm currently using RVC AI, and I'm about to record myself for the training. What is the best way to record myself except the singing and talking at least for 15 minutes like the guide says. Do I have to make it 20 min and one audio file or do I have to make it 20 min and maybe 10 files with 2 minutes each file? Also, can I multiply my files and reach the 15-20 minutes of audio that it's required or I have to make a different talking or singing for every audio? submitted by /u/WeldFrenzy [link] [comments]  ( 9 min )
    [D] RAG oriented fine-tune... Searching for coherence
    Still searching for a model that is well enough to make RAG... Lots of good models on huggingface, but none of them is trained to return extracted text or answers based on provided info without hallucinating something. Is quite frustrating, every week came a new version of a model that is amazing for Role play and storytelling... (some good progress also on coding...) I see lots of efforts in different RAG strategy, improving semantic search and Chunking, but the open source community still does not have a decent model fine tuned for that. I have considered the idea of make that fine tune, based on synthetic data (using Wikipedia as knowledge base), but unfortunately I have not enough funds to cover the api cost neighter to pay for some decent Gpu. I'm not going to train a 7B Model because the under 30B imho doesn't have many sense if the coherence is the main requirements. Unpopular opinion: as coherence, code llama 34B is much better to any of the 70B fine tune. Sorry to everyone for the rant... Does anyone have some tips or suggestions? Thanks in advance! Edit: My database is composed mainly by abstracts of papers and medical textbook. I admit that the domain is quite complex, but the error rate is too high. Obviously that even if prompted to avoid that (tried and refined multiple prompts, using different prompt format). Gpt3.5, Claude instant and Palm2-Bizon work fine for that task. (obviously GPT4 and Claude 2 would be best, but too expensive for me) I spent lots of time to make a solid embedding pipeline: advanced chunking, Metadata added by llm, text for similarity search different from text provided to LLM, instructor bi encoder to generate embeddings(INSTRUCTOR-XL), reranking using cross encoder, RAG-Fusion using multiple query and HyDE approach Hybrid search with BM25 So... I'm a bit frustrated that i can not run all locally, became that is a must for my project. submitted by /u/Distinct-Target7503 [link] [comments]  ( 10 min )
    [R] 2x the context length of ALiBi through position interpolation
    https://arxiv.org/abs/2310.13017# Linear position interpolation helps pre-trained models using rotary position embeddings (RoPE) to extrapolate to longer sequence lengths. We propose using linear position interpolation to extend the extrapolation range of models using Attention with Linear Biases (ALiBi). We find position interpolation significantly improves extrapolation capability on upstream language modelling and downstream summarization and retrieval tasks. submitted by /u/jwan584 [link] [comments]  ( 9 min )
    [D] How to make research publication more reproducible?
    As context, I'm personally working on a project to make ML/AI research publication more reproducible. We're backed by Balaji Srinivasan (https://twitter.com/balajis) at the level of funding and advice. It seems like, despite attempts like Jupyter Notebooks or sites like Papers with Code, most published research in ML still isn't setup to be easily reproducible. Even companies like Anthropic/OpenAI don't put much of an emphasis on reproducibility, even though it's in their interest to do so to earn public trust. Our current hypothesis is to conceptualize reproducible research as software testing. Specifically we're thinking of building tools that let you internally test the robustness of results, and externally publish them s.t. they're reproducible. You can think of it as continuous integration for reproducible research; e.g. BuildBot for Reproducible Research. One specific idea I have is to build a model evaluation/testing platform that lets you: Internally eval LLM models on open benchmarks (TruthfulQA, AGIEval, etc.) Test robustness of results under different assumptions Externally publish reproducible results I don't have a background in ML research. So I'm looking to get input from research engineers on what challenges/barriers currently exist with model testing and publishing reproducibly — so I thought I'd reach out in this community if anyone's open to that! Let me know if this post doesn't conform to the rules, or if this should go somewhere else. submitted by /u/manveerbasra [link] [comments]  ( 9 min )
    [P] Image Captioning Model
    Hello everyone, I am currently trying to find suitable image captioning and visual question answering models to implement in my project. After a quick google search I came across BLIP2 from hugging face however, its a very large model overall and both my pc and colab could never load its lightest pretrained version. Does anyone know any similar pretrained models for the specific tasks or any other way to load this kind of large model? (I tried loading it with 8bit precision which still failed) I have 16gb of RAM and the task requires image captioning and the ability to ask the model details about the specific image. Any help is greatly appreciated!! submitted by /u/Spitefulsalamander [link] [comments]  ( 9 min )
    [D] Episodic Training vs. Random Sub-Sampling in Few-Shot Learning
    I'm new to few-shot learning and I'm having trouble understanding why prototypical networks use a random sub-sampling approach while the vanilla few-shot learning approach uses episodic training. Doesn't random sub-sampling fail to guarantee that data overlapping won't occur? submitted by /u/The_Aoki_Taki [link] [comments]  ( 9 min )
    [D] High-temperature softmax
    I implemented a label propagation algorithm which is mainly used in the field of Video Object Segmentation (VOS). Basically I provide the labels for one frame and ask my model (using pre-trained encodings of frames) to do semantic segmentation on all the other frames of a video. I am obtaining consistently better results using an high temperature softmax when computing the similarity between pixels of different frames. Then the top-k similarities of each pixel (features) are used to propagate the labels from one frame to the next. I will not disclose the dataset I am using but let's say it is noisy (let's say also low quality). I want to understand why an high-temperature softmax performs better than a softmax with T=1 or an extreme T = 0.01. At the moment I get better results with T = 10, 100 and the trend in my grid search shows that even higher T could be possible. I was wondering if the model is still considerable valid if T is too high. I feel like the model is almost randomly guessing, if T is too high, but this apparently enhances performance. Every help is appreciated. Also literature about the topic! I only found one paper (which uses an high-temperature softmax to distill knowledge in a student-teacher network for remote sensing imagery) submitted by /u/darthjeio [link] [comments]  ( 9 min )
    [D] Callbacks in tensorflow v1
    Hi everyone, I have some old code written in tf1. It has not been ported to tf2 or pytorch yet. Does anyone of you have leads on whether one can implement custom callback for tf1 code and if there are any examples on the web? Thanks in advance. submitted by /u/wrik003 [link] [comments]  ( 9 min )
    [N] CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP Performance on Low-Resource Languages
    In the tech report of GPT4, an analysis was conducted on the impact of different languages on model performance. These effects are attributed to the amount of data and language characteristics. This also indicates that the model's effectiveness may not meet the expectations of users in different languages. The problem addressed in this paper is of significant importance. https://preview.redd.it/s48419fe9yvb1.jpg?width=2748&format=pjpg&auto=webp&s=ba76f1bd18043c6cb2610ed90f5c41a78b5ccd95 Arxiv: https://arxiv.org/abs/2310.13683v1 Stay updated with AI in a fun-to-listen way. Check out ai-dailynews.com to generate your personalized news podcast🎙. It's one of my open-source projects and takes no charge. submitted by /u/xuying_li [link] [comments]  ( 9 min )
    [N] Neural-Base Music Generation for Intelligence Duplication
    The paper employs a deep learning system to learn from the great composer Beethoven and capture his composition ability in a hash-based knowledge base. This new form of knowledge base provides a reasoning facility to drive the music composition through a novel music generation method. https://preview.redd.it/l9gzcoe38yvb1.png?width=1944&format=png&auto=webp&s=d6c5ca7f8fe434be1187c1f0440c5a94ebfc9b64 Arxiv: https://arxiv.org/abs/2310.13691v1 For more AI updates, check out this AI-generated news podcast🎙 tailored to your preferences(ai-dailynews.com), which is open source and free. submitted by /u/xuying_li [link] [comments]  ( 9 min )
    [D] Biclustering with the same row and column clusters
    The biclustering algorithm partitions rows and columns of a matrix into clusters so that the variance inside each intersection between row and column clusters in minimized. I want to perform the biclustering of a matrix, but additionally to enforce that the row and column clusters are the same, i.e. if the row i lies inside a row-cluster c then the column i must lie in a column-cluster c. Rows and columns in the matrix represent the same entities (but the matrix is non-simmetric). sklearn implementation does not support such a constraint. Are there any algorithms for this at all? submitted by /u/Tomarchelone [link] [comments]  ( 9 min )
    [D] Referenceless NLP Evaluation
    Hey all, I'm building this open source project that helps ML engineers evaluate LLM applications (its like unit testing for LLMs), and it works great in development since users can just write a test_file.py like how you would normally do it in pytest, but as I'm going onto the next phase I'm thinking how to bring evaluation to production, especially on metrics such as factual consistency where I need a ground truth. I'm hoping to get some ideas around this. Here's a link to the repo (https://github.com/confident-ai/deepeval) if you want more clarity on what the package looks like, but most importantly any help to brainstorm production evaluation will be greatly appreciated. Thank you very very much! submitted by /u/Ok_Constant_9886 [link] [comments]  ( 9 min )
    [D] Is Computer Vision dead? - “Quo Vadis, Computer Vision?”
    In ICCV23, several top notch researchers shared their insights (in a workshop called “Quo Vadis, Computer Vision?”) wrt the current state of Computer Vision, especially in light of the meteoric raise of LLMs. Has CV stalled? Is CV dead? E.g.MIT’s professor Bill Freeman, has some interesting points on foundation models: “FM aren’t fundamental, therefore not stable". Jitendra Malik argues "video can describe the world better than text." submitted by /u/btcmx [link] [comments]  ( 9 min )
    [R] Biologically plausible vision models for classification and grasping tasks
    Hey everyone! I am looking for papers that propose or explore biologically plausible vision models, primarily tasks like classification and grasping (predicting grasping bounding boxes) tasks. By biologically plausible, I mean papers that propose models inspired by the human brain in some way or the other. I know convolution is loosely inspired by human cognition, but everything I can find seems to suggest the opposite for ViT like models. I have come across certain papers like these: - https://arxiv.org/abs/1901.00945 - https://proceedings.neurips.cc/paper/2020/hash/98b17f068d5d9b7668e19fb8ae470841-Abstract.html But I am still looking for more. Any suggestions? submitted by /u/Far_Clothes_5054 [link] [comments]  ( 9 min )
    [D] Understanding the math behind diffusion models
    I was trying to comprehend the math behind this paper: https://arxiv.org/pdf/2006.11239.pdf. You can see in the equation corresponding to the forward diffusion process, at each time step, the image in the previous step is also scaled by sqrt(1-beta_t) while adding noise. It seems like the purpose of this is to maintain a fixed variance (or specifically, unit variance) at each time step. My question is: What is the significance of maintaining unit variance at each time step? Why is this useful? I saw somewhere that this is done to prevent the variance from "exploding." I don't really know what this means. I guess the variance keeps on increasing if the scaling isn't done. But why is this bad? submitted by /u/fallendeviL701b [link] [comments]  ( 9 min )
    [D] Neural Attention - One simple example that explains everything you need to know
    submitted by /u/AvvYaa [link] [comments]  ( 9 min )
    [D] Has anyone tried deploying FastAPI v2 with a BERT model on the NVIDIA Triton Inference Server?
    I'm not sure how to enable BERT with flash attention during the start-up of the Triton server in order to accelerate inference. Dao(the author of FA) told me he’s never tried. submitted by /u/g14loops [link] [comments]  ( 9 min )
  • Open

    Etsy Taking Stores Down as it's Bot Can't Tell Which Mockups are Real and Which ones are AI Generated
    If you are an Etsy seller or know someone who sells on Etsy, or maybe you went on Etsy and your favorite store is gone, could be due to the Etsy bots taking down stores for not figuring out properly which Mockup Images are real and which ones are AI Generated. All you have to do to find this out is go on youtube or social media and look for "etsy mockups news". Also Etsy has been pretty quiet about this and as a result Etsy sellers are going crazy about this as no one knows why some stores who haven't used AI to create their mockups are being targeted by these bots. This just goes to show how hard is getting to distinguish between what is real and what is AI generated and how across all industries companies are having issues adapting to AI technology changes. Thoughts? submitted by /u/fk1220 [link] [comments]
    New data poisoning tool lets artists fight back against generative AI
    Nightshade is a new data poisoning tool that allows artists to fight back against generative AI models. By adding invisible changes to the pixels in their art, artists can cause chaos and unpredictable results in AI models that use their work without permission. The tool, called Nightshade, is intended as a way to fight back against AI companies that use artists’ work to train their models without the creator’s permission. Using it to “poison” this training data could damage future iterations of image-generating AI models, such as DALL-E, Midjourney, and Stable Diffusion, by rendering some of their outputs useless—dogs become cats, cars become cows, and so forth. AI companies such as OpenAI, Meta, Google, and Stability AI are facing a slew of lawsuits from artists who claim that th…
    I would like to upload 100+ one-hour-long podcasts in MP3 and get a 1-page summary of the most important points discussed in each episode — what's the best way to go about doing this?
    ChatGPT and Bard are cool, but I have to manually feed them transcripts generated by Whisper to get summaries. Furthermore, since the length of the transcript is often longer than the maximum character limit(s), I have to add additional prompts in between copying and pasting multipart transcripts. Since these recordings are 10–15 years old, the audio quality isn't the best, but I think it's sufficient to generate transcripts + detect speech, if not, I might need an additional "audio cleaning" step as well. I don't mind paying, and I'm above average in technical ability, so if anyone has any suggestions, I'd love to hear them. Here's what the workflow would look like: INPUT: I will upload a folder containing 100+ MP3 files of podcasts with below-average audio quality. OUTPUT: I would like to get a Google Doc or a Text file with 1-page summaries of the most important points in bullet-point format corresponding to each episode. Each page should be separated by some sort of divider, and the header should contain the filename for reference. Ideally, there should be an existing Jupyter Notebook I could throw in Google Colab and do all of the above in a plug-and-play manner, but if not, I'd love to hear your thoughts. Any tips? Thanks! submitted by /u/aknalid [link] [comments]
    The dilemma of potential AI consciousness isn't going away - in fact, it's right upon us. And we're nowhere near prepared. (MIT Tech Review)
    https://www.technologyreview.com/2023/10/16/1081149/ai-consciousness-conundrum/ "AI consciousness isn’t just a devilishly tricky intellectual puzzle; it’s a morally weighty problem with potentially dire consequences. Fail to identify a conscious AI, and you might unintentionally subjugate, or even torture, a being whose interests ought to matter. Mistake an unconscious AI for a conscious one, and you risk compromising human safety and happiness for the sake of an unthinking, unfeeling hunk of silicon and code. Both mistakes are easy to make." "Every expert has a preferred theory of consciousness, but none treats it as ideology—all of them are eternally alert to the possibility that they have backed the wrong horse." "The trouble with consciousness-­by-committee, though, is that this state of affairs won’t last. According to the authors of the white paper, there are no major technological hurdles in the way of building AI systems that score highly on their consciousness report card. Soon enough, we’ll be dealing with a question straight out of science fiction: What should one do with a potentially conscious machine?" "For his part, Schwitzgebel would rather we steer far clear of the gray zone entirely. But given the magnitude of the uncertainties involved, he admits that this hope is likely unrealistic—especially if conscious AI ends up being profitable. And once we’re in the gray zone—once we need to take seriously the interests of debatably conscious beings—we’ll be navigating even more difficult terrain, contending with moral problems of unprecedented complexity without a clear road map for how to solve them." submitted by /u/kamari2038 [link] [comments]
    The Future of AI Voice Technology
    submitted by /u/Amandacerni [link] [comments]
    UK officials use AI to decide on issues from benefits to marriage licences
    submitted by /u/sky_badger [link] [comments]
    One-Minute Daily AI News 10/22/2023
    A new AI agent Eureka developed by NVIDIA Research that can teach robots complex skills has trained a robotic hand to perform rapid pen-spinning tricks — for the first time as well as a human can.[1] Meta’s Habitat 3.0 simulates real-world environments for intelligent AI robot training.[2] South Korea’s SK telecom Co. will collaborate with Deutsche Telekom AG to jointly develop a telecommunications-specific artificial intelligence (AI) large language model (LLM) as competition intensifies among local telecom companies to expand overseas with their own AI capabilities.[3] Scientists say they have built an artificial intelligence (AI) tool that can successfully identify and confirm supernovas.[4] Sources: [1] https://blogs.nvidia.com/blog/2023/10/20/eureka-robotics-research/ [2] https://siliconangle.com/2023/10/20/metas-habitat-3-0-simulates-real-world-environments-intelligent-ai-robot-training/ [3] https://pulsenews.co.kr/view.php?year=2023&no=810112 [4] https://learningenglish.voanews.com/a/researchers-build-first-tool-to-discover-supernovas/7318435.html submitted by /u/Excellent-Target-847 [link] [comments]
    How To Earn $1M+ By Using AI To Write Books
    I've been using ai for a long time, it often helps me to reduce my work time, but I want to try to earn money and decided to make an investigation. I want to hear your opinion on my analysis, and maybe this post will help someone in starting a business through ai Joe Popelas, a very young entrepreneur, has made over a million dollars within the last year selling AI-generated books online. I literally got fascinated by how simple yet powerful it is with these tools to create a book within a matter of a few hours. Joe Popelas is one of a new breed of AI entrepreneurs who capitalized on the democratization of large language models. Joe's story demonstrates the power of combining human creativity with AI. While AI tools did the heavy lifting for his initial drafts, Joe spent time refining …
  • Open

    From text to dream job: Building an NLP-based job recommender at Talent.com with Amazon SageMaker
    This post is co-authored by Anatoly Khomenko, Machine Learning Engineer, and Abdenour Bezzouh, Chief Technology Officer at Talent.com. Founded in 2011, Talent.com is one of the world’s largest sources of employment. The company combines paid job listings from their clients with public job listings into a single searchable platform. With over 30 million jobs listed […]  ( 12 min )
  • Open

    Street View to the Rescue: Deep Learning Paves the Way to Safer Buildings
    Images such as those in Google Street View are taking on a new purpose in the hands of University of Florida Assistant Professor of Artificial Intelligence Chaofeng Wang. He’s using them, along with deep learning, in a research project to automate the evaluation of urban buildings. The project aims to help governments mitigate natural disaster Read article >  ( 6 min )
  • Open

    How to properly evaluate competitive MARL?
    Hello, everyone! I'm building a MARL agent for a zero-sum game and I'm having a hard time evaluating it. I managed to quickly train it for a simple case and I could manually verify that it was actually learning the optimal decision making because I already know how the game works and, for this simple case, I know that there actually is a mathematically correct way to play it (from both sides) and how it should be played, but that isn't true for most cases (and even if it was, I wouldn't be able to manually verify thousands of games). To complicate things even more, there are billions and billions of possible initial states. For single-agent RL, I could set a reward threshold (if I knew which was the maximum reward possible) or at least I could set a maximum time of "no improvement" but, in a zero-sum game, the sum of the policy rewards is, well, zero. I could think of two solutions: Evaluate convergence to Nash Equilibrium on a subset of the possible initial states, which could be a problem because I'm not sure if the game dynamics guarantee the existance of Nash Equilibria; Evaluate convergence of the winrate of the trained agent against a "hand-crafted" baseline agent, which could be a problem because the quality of this evaluation method could depend on how well I can make this baseline agent (which won't be even close to optimal, otherwise I wouldn't be training an agent). Any thoughts? submitted by /u/victorsevero [link] [comments]
    Programmatic backdoors: DNNs can use SGD to run arbitrary stateful computation
    submitted by /u/gwern [link] [comments]
    In your opinion, which is the most beautiful form of the Bellman Equation and why?
    Didn't see anything about this kind of post in the rules I'm asking for a tattoo idea haha submitted by /u/victorsevero [link] [comments]
    Inverted pendulum swing-up problem not converging to global optimum using SAC or TD3.
    I am making a thesis about using RL to solve the inverted pendulum swing-up problem. I have tried using TD3, SAC, and TD3-Fork. In my testing, TD3-Fork worked best, I think SAC would also work if I am able to tune the hyperparameters correctly. I would like a similar trained agent to td3 converged where the agent balances the pole almost indefinitely. I have tried the hyperparameters from the website and also different hyperparameters but it has not converged. I am wondering if I am missing something or if there is anything I can do to improve the agent. I have been thinking of using HER instead of FORK. Any help or advice would be appreciated. training reward data The 'maximum' reward that I could get in the simulation is >880. The reward function that I used is -[cos(theta) + 10(|x| > 0.9) + 10(|theta_dt| > 18)]. However, from the data above it only converges to about 837 max and rarely reaches >900. trained td3 fork agent submitted by /u/YEEETTT0708 [link] [comments]
    Godot enables me to do pure C# Deep reinforcement learning.
    submitted by /u/Vae94 [link] [comments]
    [R] Demo of “Flow-Lenia: Towards open-ended evolution in cellular automata through mass conservation and parameter localization” (link to paper in the comments)
    submitted by /u/gwern [link] [comments]
  • Open

    Celebrating Kendall Square’s past and shaping its future
    The 15th Kendall Square Association annual meeting explored new and old aspects of the neighborhood.  ( 9 min )
  • Open

    Nonlinear algebra
    What is nonlinear algebra? Negations are tricky. They may be the largest source of bugs in database queries. You have to carefully think about what exactly are you negating. Any time you see “non-” attached to something, you have to ask what the context is in which the negation takes place. For example, if you […] Nonlinear algebra first appeared on John D. Cook.  ( 6 min )
  • Open

    Are Generalized Self-Supervised ViT Models the Image Objective Counterpart of LLM’s?
    submitted by /u/No-Platypus4021 [link] [comments]
    Neural Networks: A Deep Dive into AI's Building Blocks
    submitted by /u/Emily-joe [link] [comments]
  • Open

    Abstracts: October 23, 2023
    Today on “Abstracts,” Partner Research Manager Andy Gordon & Senior Researcher Carina Negreanu explore new work introducing co-audit, a term for any tool-assisted experience that helps users of generative AI find and fix mistakes in AI output. The post Abstracts: October 23, 2023 appeared first on Microsoft Research.  ( 16 min )

  • Open

    [P] Having GPT-4 Iterate on Unit Tests like a Human
    Hi r/MachineLearning, My name is William and I’m one of the founders of Sweep. Sweep is an AI junior developer that writes and fixes code by mirroring how a developer works. While building Sweep, we used to use the Github API, but we ran into rate limits, so we changed this to clone your repository for the duration of the request. It's now coming full circle. Sweep can now write, run, and debug a failing unit test for the ClonedRepo class! Blog: https://docs.sweep.dev/blogs/ai-unit-tests Video: https://www.youtube.com/watch?v=N9PUxmja9z4 submitted by /u/williamsweep [link] [comments]  ( 9 min )
    [D] Structured learning resources for ML Theory
    So essentially what the title says. I want to truly understand whats happening behind Machine Learning in general and also behind each algorithm specifically (starting from the basics to more advanced things, like Logistic Regression, Decisions trees and random forests, Deep Learning, NLP, GANS...). By structured I mean it contains all the pieces ordered and organized, from the same source, so you can can actually go from the building blocks up, not just a YouTune channel that uploads interesting videos about different machine learning related topics. Regarding the medium, I don't really mind but I would prefer audiovisual content (YT channel/playlists, Lectures, conferences...) but if you really recommend a specific book or series of books that's also okay. If it has some practical focus to it (to better grasp the theory) that would great. Also, I would prefer if it goes deep into the details, but not too deep into the specific maths involved, but if it's the case thats also okay. Regarding price, obviously if it's free that would be awesome, but in the range of free to 40€ is fine. Thank you for your recommendations in advance!! submitted by /u/aleradamantis [link] [comments]  ( 9 min )
    URL PHISHING OR BENIGN USING DEEP LEARNING "[Research]", "[R]", "[Project]", "[P]"
    Guys does anyone have an idea why my model does not work and it's like 50-50 chance to get it right. I'm getting really frustrated. Here is the code so far: ​ import pandas as pd import torch import torch.nn as nn import torch.optim as optim from torch.utils.data import Dataset, DataLoader from collections import Counter from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score import re from imblearn.under_sampling import RandomUnderSampler # Loading the data file_path = "C:/Users/alex/Desktop/DATASET/malicious_phish.csv" data = pd.read_csv(file_path) # Filtering data filtered_data = data[data['type'].isin(['phishing', 'benign'])] # Undersampling the majority class rus = RandomUnderSampler(rand…  ( 12 min )
    [D] - Pre-Training a 4bit model (NOT Fine-tunning)
    Pre-Training using 4bit (NOT fine-tunning) Hello community! I have been messing around with open source LLM's running them locally using peft and AutoGPTQ in Transformers. I even trained a few QLora models (my favorite part) However my question is this, given the performance of a 4bit model why hasn't there been any research in this area? Is it possible to even create a new model using 4bit altogether? I am sure it's not as easy as it sounds but I haven't seen anyone try. Just curious cause it will open doors for many of us with consumer grade hardware. Thanks! submitted by /u/Delicious-Farmer-234 [link] [comments]  ( 9 min )
    [P] Infinity, a FOSS project for supporting RAG for LLMs and Vector Embeddings.
    https://github.com/michaelfeil/infinity Infinity, a open source REST API for serving vector embeddings, using a torch / ctranslate2 backend. Its under MIT License, fully tested and available under GitHub. I am the main author, curious to get your feedback. FYI: Huggingface launched a couple of days after me a similar project ("text-embeddings-inference"), under a non open-source and non-commercial license. submitted by /u/OrganicMesh [link] [comments]  ( 9 min )
    [R] Combining Thermodynamics and Diffusion Models for Collision-Free Robot Motion Planning
    Researchers from Yonsei University and UC Berkeley recently developed a new AI method for enabling autonomous robots to navigate unfamiliar environments filled with obstacles using only visual data as input. The key innovation is a customized diffusion model. Diffusion models can generate diverse motion plans by adding controlled noise. The researchers tailored the model to mimic how heat avoids insulation when dispersing through space. Similar to heat navigating around insulators, this "collision-avoiding" diffusion model learns to predict robot motions that avoid collisions with obstacles. It generates reachable goals and viable motion plans to those goals simultaneously. In simulations, this approach achieved ~98% success rates in navigating to target destinations while avoiding randomly generated obstacles using only visual map images as input. While extensive real-world testing is still needed (only 2D, only simulation), these initial results showcase promising capabilities: Enables navigation in unfamiliar environments without pre-mapping. Flexibly identifies and progresses toward reachable goals. Avoids unnecessary sensing systems for obstacle avoidance. Learns complex collision avoidance heuristics from visual data. I like the thermo + AI + robotics combination here - takes me back to my days in aerospace engineering. Pretty interesting approach. Full summary is here. Paper here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [R] Speeding up open source LLMs with speculative decoding
    submitted by /u/firef1y1 [link] [comments]  ( 9 min )
    [P] Open Source AI repos that caught my 👀 this week
    @MetaGPT_ github.com/geekan/MetaGPT - multi agent collaboration - MetaGPT encodes Standard Operating Procedures (SOPs) into prompts. The claim is that it takes a one line requirement as input and outputs user stories / competitive analysis / requirements / data structures / APIs / documents, etc. @Ollama_ai github.com/jmorganca/olla… - run large language models locally. The future of AI/LLMs may not be on the cloud, but on your own laptops/mobiles. ollama.ai/blog/building-… @huggingface github.com/huggingface/ca… - slick ML framework for Rust with a focus on performance (including GPU support) @remilouf github.com/outlines-dev/o… - helps developers guide text generation to build robust interfaces with external systems. Provides generation methods that guarantee that the output will match a regular expressions, or follow a JSON schema. github.com/YiVal/YiVal enterprise AI platform submitted by /u/oana77oo [link] [comments]  ( 9 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 9 min )
    [D] Teachers struggle to adapt amid AI revolution in education
    submitted by /u/DutchTechJunkie [link] [comments]  ( 9 min )
    [R] Language interaction to assist in composing and refining music
    Hey guys, I found an interesting paper recently. Universities in the UK introduced Loop Copilot, enabling users to generate and iteratively refine music through an interactive, multi-round dialogue interface. Using language interaction to assist in composing music is very appealing, AI makes a complex workflow easy and automated. https://preview.redd.it/nzvlgevcarvb1.jpg?width=998&format=pjpg&auto=webp&s=815e0f48c299831700215ebdb4257423e317f5ec submitted by /u/xuying_li [link] [comments]  ( 9 min )
    [D] How to account for extreme periods in time series forecasting?
    I am performing a (machine learning) time series forecast on monthly data from the last 20 years. If I separate my data into a train, validation, and test set, the validation set is almost completely filled with extreme values due to the Covid period. How to account for this? submitted by /u/Ambitious-Pay6329 [link] [comments]  ( 9 min )
    [P] Graphing emotion events with LMs for in-depth sentiment analysis
    submitted by /u/helliun [link] [comments]  ( 8 min )
    [D] DINOv2 Breakdown: I've Created a Visual Guide to the Model's Design & a Concise Code Walkthrough
    submitted by /u/CkmCpvis [link] [comments]  ( 9 min )
    [R] Do you read ML/DL/AI related scientific papers? How do you filter them?
    As the title says. Recently, I found a review paper where the authors showed an exponential growth of published papers related to ML or DL. I was wondering if you even read those. If yes what's your way to find good and reliable papers? Do you choose only ones with a significant number of citations? Or just strictly related to your field? If no, why not? https://preview.redd.it/jwjvej5f5qvb1.jpg?width=1080&format=pjpg&auto=webp&s=bf3f7e08e0fe09fe0c6a6fd8d194945b45f5858e submitted by /u/hahahaczyk [link] [comments]  ( 9 min )
    [R] Open-Source Projects on Detecting Landmines
    I know that there are a lot of efforts at the moment to improve the algorithms used for landmine detection. Is anyone aware of any ongoing open-source projects in this space? submitted by /u/Eightstream [link] [comments]  ( 9 min )
    [R] Demo of “Flow-Lenia: Towards open-ended evolution in cellular automata through mass conservation and parameter localization” (link to paper in the comments)
    submitted by /u/hardmaru [link] [comments]  ( 9 min )
    German researchers create DeepMB for faster, high-quality optoacoustic imaging [N]
    Researchers from Germany have developed DeepMB, a groundbreaking deep-learning framework enabling high-quality and real-time optoacoustic imaging via multispectral optoacoustic tomography (MSOT). With potentially transformative implications for health care, this innovation might redefine medical imaging standards. To stay ahead of developments in AI, look here first. DeepMB breakthrough DeepMB resolves the longstanding tradeoff between image quality and speed in medical imaging. The deep-learning framework uses a deep neural network for model-based reconstruction, allowing for fast, high-quality imaging. DeepMB can reconstruct images approximately 1000 times faster than conventional techniques, with virtually no loss in image quality. Impressive metrics and implications The researchers accomplished accurate optoacoustic image reconstruction in just 31 milliseconds per image by training the system to pairingly synthesize optoacoustic signals with ground-truth images. DeepMB promises to equip clinicians with immediate access to high-quality MSOT images, regardless of the patient's condition or scanned body area. The technology could extend to other imaging modalities, such as ultrasound, x-ray, and MRI, potentially changing how diseases are diagnosed and treated. Exciting prospects The development of DeepMB is a significant leap in optoacoustic imaging, promising to enhance healthcare outcomes. As DeepMB evolves, it could become integral to modern medical imaging, delivering high-quality results at previously unattainable speeds. (source) P.S. If you like this kind of analysis, I write a free newsletter that unpacks the most significant news and research in AI. Google, Meta, and OpenAI professionals are already subscribed submitted by /u/orthomax23 [link] [comments]  ( 9 min )
    [D] ForeCastNet. Neural PDEs perform global weather simulation 4 to 5 orders of magnitude faster than traditional numerical methods.
    submitted by /u/moschles [link] [comments]  ( 9 min )
    Data labeling service for keypoints / pose [D]
    I was previously using scale.ai but they have been extraordinarily slow. Does anyone have recommendations for services to label keypoints or pose? Bonus points if the labeling service is able to handle 3D / multi angle data coming from multiple cameras. I work in an academic lab and scale is <10k images per batch. submitted by /u/researchrig [link] [comments]  ( 9 min )
    [D] Need help with text-to-song diffusion model architecture
    Hey, I want to make a text-to-song diff model, but I can't figure out the architecture I have already prepared a dataset of about 5000 songs of different genres, artists. It only contains the lyrics, the genre and the song itself Do I understand correctly that I should just encode the text and genre into one vector using CLIP and hope that the model will directly follow it (not skipping words and lines), or should I somehow make timestamps in the dataset (when, where and what text is sung)? I was inspired by Chirp V1 submitted by /u/Head-Selection-9785 [link] [comments]  ( 9 min )
  • Open

    IBM's NorthPole chip runs AI image recognition 22x faster than current chips
    IBM has developed a chip called NorthPole that runs AI-based image recognition 22 times faster than current chips on the market. The chip uses a two-dimensional array of memory blocks and interconnected CPUs to process data quickly. However, it can only run specialized AI processes and not training processes or large language models. The researchers plan to test connecting multiple NorthPole chips together to overcome this limitation. Source : https://techxplore.com/news/2023-10-ibm-northpole-chip-ai-based-image.html submitted by /u/NuseAI [link] [comments]
    Email Ai
    is there a website or some Ai to help me clean my inbox, stop receiving emails from certain senders etc etc... I've heard about: Sanebox for keeping your inbox organized Mailbutler for gathering contact details and tasks EmailTree for creating AI-powered workflows But they are paid and I'm looking for free alternatives submitted by /u/JOTA-137_0 [link] [comments]
    Microsoft CEO Satya Nadella talks AI, closing the Activision Blizzard deal, and his best business decision so far
    submitted by /u/thisisinsider [link] [comments]
    Medical Student Question: Why aren't there any programs that do differential diagnosis for doctor?
    Based on input you have. This would be like an enterprise software level program I guess and you would input history and then through trawling through data locally it can generate diseases and probability patient has each disease based on data inputted Why doesn't something like this already exist? I am learning how to do differential diagnosis now and it seems use extremely rudimentary understanding of probability to diagnose things. You use clusters of symptoms and then use tests to eliminate stuff in the differential. It just seems like low hanging fruit that a program could do using tech we already have (I imagine LLMs will make it easier) submitted by /u/derpgod123 [link] [comments]
    Tried visualizing an entire script using Dall-E 3 and these are the results.
    https://preview.redd.it/vi9wx005ksvb1.jpg?width=1024&format=pjpg&auto=webp&s=75502abcae7f2337693175101cb3491b8647d70d Revived an old script and made some images for it using Dall-E 3, just to test out the workflow: https://docs.google.com/document/d/1yyWRRmd0ah5Z4u8_aNYSq9csJ8pccP24Dcs9brPHbzs/edit Was pretty fun and I think by the end I got much better at learning how to maintain the consistency between characters, direct shots, etc. -~- submitted by /u/Kulimar [link] [comments]
    Combing Thermodynamics and Diffusion Models for Collision-Free Robot Motion Planning
    Researchers from Yonsei University and UC Berkeley recently developed a new AI method for enabling autonomous robots to navigate unfamiliar environments filled with obstacles using only visual data as input. The key innovation is a customized diffusion model. Diffusion models can generate diverse motion plans by adding controlled noise. The researchers tailored the model to mimic how heat avoids insulation when dispersing through space. Similar to heat navigating around insulators, this "collision-avoiding" diffusion model learns to predict robot motions that avoid collisions with obstacles. It generates reachable goals and viable motion plans to those goals simultaneously. In simulations, this approach achieved ~98% success rates in navigating to target destinations while avoiding randomly generated obstacles using only visual map images as input. While extensive real-world testing is still needed (only 2D, only simulation), these initial results showcase promising capabilities: Enables navigation in unfamiliar environments without pre-mapping. Flexibly identifies and progresses toward reachable goals. Avoids unnecessary sensing systems for obstacle avoidance. Learns complex collision avoidance heuristics from visual data. I like the thermo + AI + robotics combination here - takes me back to my days in aerospace engineering. Pretty interesting approach. Full summary is here. Paper here. submitted by /u/Successful-Western27 [link] [comments]
    I upgraded my AI girlfriend… and now she remembers stuff about me..
    submitted by /u/spaceecon [link] [comments]
    Self-learning AI Movement Prediction: Beyond Airstriker Genesis to multi-directional predictions
    Quick update on my self-learning software experiment: Thanks to your feedback, I decided to test my prediction system on a newer tower-defense game from the Apple App Store (simply called ‘The Tower’). What's crucial to remember is that this system is not pre-trained and only learns from the current game it encounters - it starts with zero knowledge and learns exclusively from the game it's currently playing, building from the ground up without the use of deep learning or neural networks. In this game (unlike Airstriker which I’ve previously used), players don't control a spaceship or fire weapons (you play the game by ‘upgrading’ your weapons, etc.). It's simpler because there's only one type of enemy that always approaches the center, so the system cannot demonstrate its capabilities for differentiation in this case. But this simplicity presents some other interesting challenges: Enemies approach from all 360-degree directions, pushing the boundaries of the path prediction software. They overlap during explosions, demanding the system to separate them. There's also more visual clutter, including static lines and a non-black background. The system's predictive performance has been remarkably strong. I’ve put together an overlay video to visually demonstrate how the system learns and adapts in this new game. Note: If things don’t align perfectly in there, it’s due to my poor video editing skills… Your feedback is appreciated as always! submitted by /u/_timmah_ [link] [comments]
    A scary thought...
    Without us, artificial intelligence just becomes intelligence submitted by /u/cognaceast [link] [comments]
    AI RPG DALL-E 3
    submitted by /u/the_anonymizer [link] [comments]
    Could machine learning produce a "simple" AI algorithm that performs better than what a human programmer could create in a reasonable amount of time?
    Let me clarify what I'm asking through an example: Artificial Intelligence in videogames has failed to develop in any meaningful way over the past two decades, at least as far as the typical end-user is concerned, and nowhere is this more apparent than in strategy games. Whether we're talking about the 90's or today, AI opponents typically have to receive significant cheats in order to provide a challenging experience for the player. This is widely considered undesirable, can harm immersion or a sense of fair-play, and leads to the concept of "cheesing" the AI (exploiting obvious weaknesses in the AI logic, something which is sometimes necessary if an AI receives such strong bonuses that any strategy you might attempt against another human player would be impossible to execute successfull…
  • Open

    Stability of a superegg
    Three weeks ago I wrote about supereggs, a shape popularized by Piet Hein. One aspect of supereggs that I did not address is their stability. Looking at the photo above, you could imagine that if you gave the object a slight nudge it would not fall over. Your intuition would be right: supereggs are stable. […] Stability of a superegg first appeared on John D. Cook.  ( 6 min )
    Best-of-five versus Best-of-seven
    Suppose that when Team X and Team Y play, the probability that X will win a single game is p and the probability that Y will win is q = 1 − p. What is the probability that X will win the majority of a series of N games for some odd number N? We know intuitively […] Best-of-five versus Best-of-seven first appeared on John D. Cook.  ( 5 min )
  • Open

    How the Self Play algorithm masters Multi-Agent AI
    submitted by /u/AvvYaa [link] [comments]
    Mujoco RL Robotic Arm
    Hi everyone, I'm new to robotic arms and I want to learn more about how to implement them using mujoco env. I'm looking for some open-source projects on github that I can run and understand. I tried MuJoCo_RL_UR5 repo but it didn't work well for me, it only deployed a random agent. Do you have any recommendations for good repos that are beginner-friendly and well-documented? submitted by /u/satyamstar [link] [comments]
    Why does bellman equation converge?
    After multiple iterations the value function converge by bellaman updates (vale iteration algorithm). Can someone provide a intuitive reasoning why the value converges? submitted by /u/RaceCondition01 [link] [comments]
  • Open

    Replay game input with image classification
    TensorFlow Keras correcting camera horizon in AC Valhalla https://www.youtube.com/watch?v=ASy-2zOMj_Y submitted by /u/Kostiantyn-Dvornik [link] [comments]
    How I determine neuron layers and amount of neurons in?
    Hello, I’m newbie in neural networks and I wonder, how do people decide how many hidden layers there will be and how many neurons will be inside? What the logic behind? submitted by /u/Particular-Song-633 [link] [comments]
    Unboxing Neuro Symbolic Reasoning and Learning
    submitted by /u/Neurosymbolic [link] [comments]

  • Open

    Google, other search engines' use of generative AI threatens $68B SEO industry
    The rise of generative AI in search engines like Google threatens the $68 billion search engine optimization (SEO) industry. Generative AI tools like ChatGPT aim to provide direct answers to user queries, bypassing the need for users to click on search results. This could render SEO efforts useless and impact the revenues of SEO consultants and search engines. However, generative AI search engines still face challenges such as providing incorrect or plagiarized answers, and gaining user trust and loyalty. Search engines have been quick to experiment with generative AI to improve search results, with Google's Bard, Microsoft's Bing AI, Baidu's ERNIE, and DuckDuckGo's DuckAssist being examples of this approach. As the quality of AI-generated answers improves, users will have less incentive to browse through search result listings, impacting the revenues of SEO consultants and search engines. The SEO industry generated $68.1 billion globally in 2022 and was expected to reach $129.6 billion by 2030, but the emergence of generative AI puts the industry at risk of obsolescence. Generative AI search engines are still in their infancy and face challenges such as providing incorrect or plagiarized answers, limiting their trust and loyalty among users. However, with the resources available to researchers, it is safe to assume that generative AI models will improve over time, leading to the potential death of the SEO industry. Source : https://theconversation.com/why-google-bing-and-other-search-engines-embrace-of-generative-ai-threatens-68-billion-seo-industry-210243 submitted by /u/NuseAI [link] [comments]
    Experimented with Fully Automating TikTok Video Creation Using AI for a Month - Here's What I Learned
    Hi everyone, I recently undertook a personal project where I tried to automate the entire process of creating TikTok videos using various AI tools. The goal was to see how advanced we've come in terms of AI's capabilities in content creation and to explore the nuances of automating a traditionally 'human' task. Here's a brief breakdown: Scripting: Leveraged ChatGPT for generating video scripts. Voiceovers: Used ElevenLabs for lifelike voice narration. Video Creation: Employed a combination of StableDiffusion Animate & Replicate. Editing: Automated the editing process to sync with the AI-generated voiceovers. After setting everything up, I ran the system for a month, generating 3 videos daily. The results were intriguing and a mix of expected and unexpected outcomes. Would love to hear thoughts, feedback, or similar experiences from the community. Are there other creative ways you've seen or used AI in content creation? submitted by /u/General_crypto [link] [comments]
    AI RPG (Dall-E 3)
    submitted by /u/the_anonymizer [link] [comments]
    Thanks to AI, the future of programming may involve YELLING IN ALL CAPS
    The future of programming may involve human-like communication techniques, including yelling in all caps. OpenAI's DALL-E 3 AI image generator integrated into ChatGPT revealed internal prompts shared between the image generator and the AI assistant. The prompts included commands written in all-caps for emphasis. This shows that programming and communicating with computers may become more human-like in the future. Previously, programs used specialized data formats and APIs to communicate, but now large language models allow for cross-program interaction in conventional English. OpenAI trained GPT-4, the AI model used in ChatGPT DALL-E interface, on hundreds of millions of documents scraped from the web, which included instances of polite language and reactions to it. The use of all-caps in the DALL-E message is interpreted as emphasis, and the model pays more attention to capitalized sentences. In the future, programming and communicating with computers may involve more emphasis and human-like communication techniques. Source : https://arstechnica.com/information-technology/2023/10/thanks-to-ai-the-future-of-programming-may-involve-yelling-in-all-caps/ submitted by /u/NuseAI [link] [comments]
    Close up view of rain hitting dust.
    submitted by /u/IllustriousVideo6145 [link] [comments]
    Impressive
    submitted by /u/the_anonymizer [link] [comments]
    Singularity Pinball.
    submitted by /u/Philipp [link] [comments]
    One-Minute Daily AI News 10/21/2023
    This dating app SciMatch uses AI to find your soulmate by your face. Snap a selfie, and let the app do the rest.[1] The Biden administration is reducing the types of semiconductors that American companies will be able to sell to China, citing the desire to close loopholes in existing regulations announced last year.[2] Business Schools Are Adding AI Education Into The Curriculum.[3] Google Pixel’s face-altering photo tool sparks AI manipulation debate.[4] Sources: [1] https://www.foxnews.com/tech/dating-app-uses-ai-find-soul-mate-face [2] https://www.cnn.com/2023/10/18/tech/us-china-chip-export-curbs-intl-hnk/index.html [3] https://www.entrepreneur.com/business-news/business-schools-are-adding-ai-education-for-future-ceos/464054 [4] https://www.bbc.com/news/technology-67170014 submitted by /u/Excellent-Target-847 [link] [comments]
    Training AI to Play Pokemon with Reinforcement Learning
    submitted by /u/ShooBum-T [link] [comments]
    ChatGPT and Bard cannot solve every problem for you.
    My last post in this thread got almost 90k views, honestly I'm very happy that I was able to be so helpful. ​ One guy asked me why I couldn't give more details about what tools I use and what tools help me?:/ I decided to make the top 24 tools and describe what they are responsible for in 2 words. In order not to violate the rules of r/artificial I decided not to leave direct links to tools, so as not to violate the rules, as some tools can be paid, I left only links to 2 resources where I took this information, but they are fortunately free. YouTube Summaries → http://eightify.app 3D Animations → http://moviebot.io AI Assistant → http://zipzap.ai Prompts → http://wnr.ai How-to-videos → http://teachomatic.net Custom AI chatbots ➝ http://chatling.ai Remove Background ➝ http://unscreen.com Forms ➝ http://feathery.io Presentations ➝ http://beautiful.ai Learning ➝ http://albus.org Blog ➝ http://jasper.ai Videos ➝ http://descript.com Image ➝ http://tryleap.ai Resume ➝ http://mosaicml.com Grammar Check ➝ http://trinka.ai Meeting ➝ http://krisp.ai Video ➝ http://decoherence.co App development ➝ http://brancher.ai Design ➝ http://modiphy.com Coding assistant ➝ http://bito.ai Twitter assistant ➝ http://tweethunter.io Personal assistant ➝ http://chat.openai.com LinkedIn assistant ➝ http://taplio.com YouTube assistant ➝ http://vidiq.com I hope this is as useful to you as the first post I'm just sharing my experiences and observations in the field of ai. LIST AND SITE https://preview.redd.it/zgkra3plpgvb1.jpg?width=1068&format=pjpg&auto=webp&s=779003d65dfa70c58d50ad690a0e436c735cdaeb submitted by /u/PerceptionPlayful469 [link] [comments]
    Oracle loops in Nvidia's AI stack for end-to-end model development
    Oracle has partnered with Nvidia to bring Nvidia's AI stack to its marketplace, giving Oracle customers access to top-of-the-line GPUs for training models and building generative applications. Eligible enterprises can purchase Nvidia's DGX Cloud AI supercomputing platform and AI Enterprise software directly from the marketplace and start training models for deployment on the Oracle Cloud Infrastructure. Nvidia DGX Cloud offers a serverless experience for multi-node training of custom generative AI models, supporting near-limitless scale of GPU resources. Nvidia AI Enterprise helps teams accelerate the deployment of models to production, with features such as the Nvidia NeMo framework, Rapids, TensorRT LLM open-source library, and Triton Inference server. Oracle has been focused on industry partnerships for its AI efforts and has announced generative AI capabilities in its products and solutions. Source : https://venturebeat.com/ai/oracle-loops-in-nvidias-ai-stack-for-end-to-end-model-development/ submitted by /u/NuseAI [link] [comments]
  • Open

    Policy Evaluation
    I know that given a policy, I can find the value function using iterative policy evaluation. Can I, given the value function, find the policy? submitted by /u/MomoSolar [link] [comments]
    Question on advantage (re-)computation for PPO
    Hi, I've been re-reading the "What matters in on-policy reinforcement learning" paper (https://arxiv.org/abs/2006.05990), and noticed that they suggest to recompute advantages at the beginning of each epoch (choice C5, see section 3.5 and appendix B.1). I was wondering: if someone here had already tried this and seen a significant improvement (which is what the paper suggests) ? if it did not also suppose to recompute the value targets at the beginning of each epoch, which could lead to some sort of moving target issue ? Best, submitted by /u/Scrimbibete [link] [comments]
    In RL, how can we reward an action taken 5 steps ago?
    Let us say we are building a model that will learn how to play a computer game like DOTA or league of legends. If model for example, buys weapon A, and use the item's ability on opponent B, it should learn what damage it gives to opponent given the items opponent B is wearing. But we would have done a lot of other actions in between before being able to use that weapon to reward the model on what it does / how much damage it made. How does do you do delayed reward for specific action made X number of steps ago? Thank you. submitted by /u/oniongarlic88 [link] [comments]
    Zoomposium with Professor Dr. Petra Ritter: "The simulation of brains"
    Zoomposium with Professor Dr. Petra Ritter: "The simulation of brains" In another installment in our "Zoomposium Series" on the topic of "Brain Research", my colleague Axel Stöcker of the "Blog der großen Fragen" and I had the great honor and pleasure of conducting an interview with the very well-known and renowned German medical doctor and neuroscientist Professor Dr. Petra Ritter. In this context, Ms. Ritter became a co-founder and leader of the co-design project "The Virtual Brain", which is a component of the European Open Science Cloud (EOSC) and is "a neuroinformatics platform for simulating whole brain networks using biologically realistic connectivity". She is leading the development of a virtual research environment as a collaborative research platform for sensitive health data and head of the "German National Neuroscience Research Infrastructure Initiative (NFDI-Neuroscince)" and involved in the development of the "Health Data Cloud EBRAINS". Petra Ritter has been Johanna Quandt Professor and Head of the Section for Brain Simulation at the Department of Neurology with Experimental Neurology at Charité - Universitätsmedizin Berlin since 2017. There, Professor Ritter and her team are involved in the "Simulation of Brains". More at: https://philosophies.de/index.php/2023/09/17/die-simulation-von-gehirnen/ ​ https://preview.redd.it/937m7mtyvivb1.jpg?width=1000&format=pjpg&auto=webp&s=22d1a7576f2ebbe7904f0187bd7c0234df7ddb8f submitted by /u/philosophiesde [link] [comments]
  • Open

    [D] PRML reading buddy
    Hey there mates, I am a 3rd year PhD student, trying to break into good quality research (tired of trying different permutations ans combinations of X and Ys, and hitting dead end when things don't work, or worse -- being unable to explain why things work :D). I have recently decided to read PRML cover to cover (slowly) and do some of the exercises as well. Goal is to finish in 6 months (2 chapters per month). Is there anyone on a similar journey, would love to tag along and discuss nuances? submitted by /u/Zealousideal_Yak9131 [link] [comments]  ( 9 min )
    [D] What do you all think of these pearls of wisdom on “Doing Great Research”?
    About the latest Jason Wei’s tweet. submitted by /u/mildlyphd [link] [comments]  ( 9 min )
    [D] Which is the best physics engine for reinforcement learning??
    What are some of the best physics engine that we should be using to implement physics for complex reinforcement related tasks(like humanoid motions) ?? I came across mujoco, physx , pybullet, issac etc but not sure which to go with. Isaac seems to be something very interesting but the minimun requirements as per the website is 32gb of RAM which is way to much for me (I use a 8gb one). mujoco is good but the docs are very confusing and hard to get through. what do you believe is the best choice to go with?? submitted by /u/rakk109 [link] [comments]  ( 9 min )
    [D] Ensemble of Strong vs Weak Predictors
    This crossed my mind recently and after searching online I couldn't find a concrete answer: would an ensemble composed of strong predictors (let's say training on 1 model of that type had a high metric performance) perform better than an ensemble composed of weak predictors? Bonus: are there any resources that would support your position you can link below? submitted by /u/robml [link] [comments]  ( 9 min )
    [R] Decoupling Features and Classes with Self-Organizing Class Embeddings
    submitted by /u/4rtemi5 [link] [comments]  ( 9 min )
    [P] [D] Hierarchical agent learns all possible policies. Would this implementation work?
    Here's my implementation of an idea I had many years ago: a Sensorimotor Inference Engine. A machine that explores the states space of an environment, learning how to traverse the state space, learning how to manipulate the environment, which when given a goal can manipulate the environment in accordance to the goal. In other words, it's an agent which learns not one policy, but all possible policies. Doing so, I believe requires a hierarchy: layers of the same structure which learn broader and broader contexts of the environment. I have recently attempted to design an extremely simple, and modularized version of this agent: The Encoder-Predictor-Actor circuit. I need feedback, do you think it would work? if it might work, how might I train the Actor model? I think I know how to train the Encoder and Predictor models, but the Actor model will be harder to train, so if you have any ideas I'd love to hear from you! ps. sorry for the typos in the image text. a first-pass diagram of the 'simplest' implementation of a sensorimotor inference engine: the encoder-predictor-actor circuit submitted by /u/Stack3 [link] [comments]  ( 9 min )
    Career suggestions [D]
    Hi there, I need some suggestions from you experts. I am an aerospace engineer (both BSc and MSc), with a university minor in AI. It's pretty clear to me that I should have studied computer science given my passion for this world. In the last 4 years I worked as engineer in a major aerospace company, and I managed to get back on track with computer science and ML by working as a data scientist and doing ML projects applied to space, while also practicing with LLM agents. My dream is to enter the AGI world, maybe working as an "AI engineer", or working on creating true "autonomous" systems, leveraging multi-modal models maybe. What do you suggest I should focus on to reach this goal? Getting first some "credit" as an ML engineer though courses and certifications, open source projects, or maybe applying right now to some startups in the field? Thanks guys! submitted by /u/cappellino1 [link] [comments]  ( 9 min )
    A[r]xiv Dives - Generating Speech from Text with Fast Speech-2
    We’ve been diving deep into Arxiv Papers as a team on Fridays, hope it’s helpful and feel free to join live if you like the format! submitted by /u/FallMindless3563 [link] [comments]  ( 9 min )
    [R] Eureka: Human-Level Reward Design via Coding Large Language Models
    submitted by /u/MysteryInc152 [link] [comments]  ( 8 min )
    Computer Vision Project Ideas [Project]
    I am taking the computer vision course at my university. We have to do a final project but I am unable to come up with concrete ideas. These are the options: • Select a paper from the computer vision literature, implement and test the approach described in that paper • Take publicly available code, apply it to an interesting novel dataset and explore various extensions and modifications. You may also want to compare two or more systems. Running existing code on the data provided by the authors is not sufficient. • Design and implement a solution to a problem that interests you. This may earn you extra credits. Can anyone please help with what to do? submitted by /u/kxenak [link] [comments]  ( 9 min )
    [D] Can you use a different dataset to run ablation experiments?
    I am on a computer vision algorithm and I will be benchmarking my method on the MS COCO dataset, like the other methods that have been proposed for the same problem. I want to know if I can use a smaller dataset (COCO minitrain) for my ablation experiments to demonstrate the efficacy of the different components used in my algorithm and to save time and cost, or will that be a red flag to journal reviewers? submitted by /u/notEVOLVED [link] [comments]  ( 9 min )
    Searching pinecone for relative date information [R]
    I am embedding with gpt and upserting large medical reports into pinecone and then would like to query for chronological result. For example, I upload a report that consists of 10 office visits. I would like to know the date and results of the first visit and then the last visit. when I embed a query containing: How did the patient describe their pain in the last office visit in the text? pinecone doesn't understand the context of 'last' since it is just doing cosine likeness. It pulls pain information but doesn't have a clue which comes first. Any help would be greatly appreciated. submitted by /u/Silent_Case_3058 [link] [comments]  ( 9 min )
    [P] Wizard101 Auto-Buyer Script/Bot - Using OCR, OpenCV Python with multiprocessor performance improvements
    submitted by /u/HistorianCrafty3514 [link] [comments]  ( 8 min )
    [P] [D] : RAG on multilevel tabular data
    Hi, Has anyone done RAG on a multi level tabular data? If yes then what problems have you faced and how did you solve those? My model gives better answers when I converted the data to a JSON and then embedded it. But I'm looking for a better approach. submitted by /u/Euphoric-Chart1428 [link] [comments]  ( 9 min )
    [D] Is Megabyte's padding the same as streamingLLM?
    I was wondering after reading the recent streamingLLM paper https://arxiv.org/pdf/2309.17453.pdf if the attention sink they use through pre-training and inference is analogous to the learnable padding used in the MEGABYTE architecture https://arxiv.org/pdf/2305.07185.pdf although used for a different purpose? So if I just used MEGABYTE with sliding window attention at inference would it be the same as streamingLLM? submitted by /u/Additional-Ad-7043 [link] [comments]  ( 9 min )
    [D] cloud computing vs personal for ML
    I need a new PC to run NN on. My training sets are about 50GB. Would I be best building my own, or using Google colab pro? Anyone know the specs equivalent to colab Pro? submitted by /u/ajplant [link] [comments]  ( 9 min )
    [D] What is the current SOTA of self-supervised knowledge graph models?
    I want to create a research proposal in this area. Ideally, I would like to work towards self-supervised models that take as input raw (not preprocessed) data of various modalities (text, image, video, audio, ...) and output a knowledge graph of all the data contained within. For example, I could feed it the Wikipedia article about dogs and it spits back all the information contained within, structured in the form of a graph. For people who work in the same general area can you point me to the SOTA models/efforts and research groups that work in this area? And can you also highlight the current challenges to be overcome, if you are deep enough to know? ​ submitted by /u/KlutzyBiz [link] [comments]
    [D] Encoder vs Decoder Transformer for Token Classification
    Hi. I am working on TokenClassification problem which requires significant language understanding in the base model and was wondering if:- Is there any research that has shown on multiple datasets that encoder-only pretraining tasks produce more optimal results when finetuned for Token Classification tasks compared to decoder-only with same parameter sized models. Since a lot of LLM research is focused on text generation, most model are trained on decoder-only pretraining tasks, so what is the largest encoder-only pretrained model that is trained on >1T tokens. If encoder-only models do indeed produce more optimal results for Token Classification is there any empirical rule w.r.t. to parameter size that we can expect decoder-only to outperform encode-only models. (Eg. say 3B decoder-only is equivalent to 1B encoder-only with similar pretraining and finetuning data) submitted by /u/RemoteSaint [link] [comments]  ( 9 min )
    [D] Need some practical advice on choosing from different CNN model architectures.
    Hi everyone. I would just like to discuss a few things. I've spent about 2 months studying CNNs on coursera from the Deep Learning Specialization. In this time period I learnt the fundamentals and mechanisms of how CNNs work. I also took lectures on a few research papers that studied a few classical CNN models like AlexNet, LeNet-5, VGG-16. And then a few research papers that studied advanced stuff like ResNets, Inception Network, MobileNet, EfficientNet etc. Following that I studied Detection Algorithms, with a primary focus on YOLO Algorithm. I also briefly studied Regional Proposals, Semantic Segmentation, R-CNN, Fast-RCNN, Faster R-CNN, U-Net. I also learnt Face Recognition and Verification Models like Siamese Network using Triplet Loss function and Binary Classification. And also cove…  ( 10 min )
    [D] [P] Web browsing UI-based AI agent: GPT-4V-Act
    Github: GPT-4V-Act (A demo video can be found on the Github) Hi there! I'd like to share with you a project I recently developed. My inspiration came from a recent post about Set-of-Mark visual grounding in GPT-4V. Fascinatingly, my tests showed that GPT-4V, equipped with this capability, could inspect a UI screenshot and provide the precise pixel coordinates needed for steering a mouse/keyboard to perform a specified task. Motivated by this, I built a proof-of-concept web browser embedded with a co-pilot that can "view" the browser and interact with it. Currently, the demo is basic, utilizing web-scraping to morph ChatGPT Plus into an unofficial GPT-4V API at the backend. It lacks some actions and an adblock, resulting in the agent potentially being overloaded by the extensive popups …  ( 10 min )
  • Open

    Grade School & Preteen AI & Data Literacy
    I recently wrote the book “AI & Data Literacy: Empowering Citizens of Data Science” to help non-data scientists – which is most of the world – understand the risks associated with how companies capture and use your personal data to influence your viewing and buying habits… and even your political and societal beliefs.  And while… Read More »Grade School & Preteen AI & Data Literacy The post Grade School & Preteen AI & Data Literacy appeared first on Data Science Central.  ( 22 min )
  • Open

    Is there any neural network or LLM like chatgpt,midjourney that can help us train and generate custom sounds
    ​ Generating a Wide Variety of Sounds I'm a non-technical person with very little knowledge to develop AI tools and intending to learn Python and based on that My question is as follows: ​ Are there tools or chatgpt like platforms that can help people like me to generate couple of sounds like dog barks, cat meows. I want either something that can generate a variety of sounds or I want to work towards making something that cane help me generate audios like dog barks, such as fierce, aggressive ones but not just limited to dog barks but also sound focused on nature, other animals, vehicles, machinery(e.g., honks, engine sounds ), and possibly human sounds (though that's not my primary focus for now). The amount of technical Assistance Needed I also came across a tool like Teachable Machine and was wondering if it could be a solution as it does offer tools for audio. I am also aware that I would need datasets for such a task but apart from that I am not too sure about the nitty gritty or should I say the intricacies involved as well as the knowledge needed as I do assume it is likely not very easy https://www.youtube.com/watch?v=L4GOmYPPqn8&t=1854s ​ [Teachable Machine](https://teachablemachine.withgoogle.com/) ​ Inspiration I was inspired by a project I found here: [https://x.com/TheAIAnonGuy/status/1684443155448360961?s=20] ​ ​ Can anyone provide insights, guidance, or recommendations on how to accomplish this? To be fair, I'm not really sure if this is an audio-related or neural/machine learning (ML)/deep learning related learning question. But I would like more insight if this is possible on an individual scale either with teachable, code or AI or a combination of all approaches and if there are any beginner friendly ways to achieve this Thank you all for your assistance! submitted by /u/Beginning_Finding_98 [link] [comments]
  • Open

    A Unified Approach to Domain Incremental Learning with Memory: Theory and Algorithm. (arXiv:2310.12244v1 [cs.LG])
    Domain incremental learning aims to adapt to a sequence of domains with access to only a small subset of data (i.e., memory) from previous domains. Various methods have been proposed for this problem, but it is still unclear how they are related and when practitioners should choose one method over another. In response, we propose a unified framework, dubbed Unified Domain Incremental Learning (UDIL), for domain incremental learning with memory. Our UDIL **unifies** various existing methods, and our theoretical analysis shows that UDIL always achieves a tighter generalization error bound compared to these methods. The key insight is that different existing methods correspond to our bound with different **fixed** coefficients; based on insights from this unification, our UDIL allows **adaptive** coefficients during training, thereby always achieving the tightest bound. Empirical results show that our UDIL outperforms the state-of-the-art domain incremental learning methods on both synthetic and real-world datasets. Code will be available at https://github.com/Wang-ML-Lab/unified-continual-learning.  ( 2 min )
    Cooperative Minibatching in Graph Neural Networks. (arXiv:2310.12403v1 [cs.LG])
    Significant computational resources are required to train Graph Neural Networks (GNNs) at a large scale, and the process is highly data-intensive. One of the most effective ways to reduce resource requirements is minibatch training coupled with graph sampling. GNNs have the unique property that items in a minibatch have overlapping data. However, the commonly implemented Independent Minibatching approach assigns each Processing Element (PE) its own minibatch to process, leading to duplicated computations and input data access across PEs. This amplifies the Neighborhood Explosion Phenomenon (NEP), which is the main bottleneck limiting scaling. To reduce the effects of NEP in the multi-PE setting, we propose a new approach called Cooperative Minibatching. Our approach capitalizes on the fact that the size of the sampled subgraph is a concave function of the batch size, leading to significant reductions in the amount of work per seed vertex as batch sizes increase. Hence, it is favorable for processors equipped with a fast interconnect to work on a large minibatch together as a single larger processor, instead of working on separate smaller minibatches, even though global batch size is identical. We also show how to take advantage of the same phenomenon in serial execution by generating dependent consecutive minibatches. Our experimental evaluations show up to 4x bandwidth savings for fetching vertex embeddings, by simply increasing this dependency without harming model convergence. Combining our proposed approaches, we achieve up to 64% speedup over Independent Minibatching on single-node multi-GPU systems.  ( 3 min )
    Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer. (arXiv:2310.12442v1 [cs.CL])
    Pretrained transformer models have demonstrated remarkable performance across various natural language processing tasks. These models leverage the attention mechanism to capture long- and short-range dependencies in the sequence. However, the (full) attention mechanism incurs high computational cost - quadratic in the sequence length, which is not affordable in tasks with long sequences, e.g., inputs with 8k tokens. Although sparse attention can be used to improve computational efficiency, as suggested in existing work, it has limited modeling capacity and often fails to capture complicated dependencies in long sequences. To tackle this challenge, we propose MASFormer, an easy-to-implement transformer variant with Mixed Attention Spans. Specifically, MASFormer is equipped with full attention to capture long-range dependencies, but only at a small number of layers. For the remaining layers, MASformer only employs sparse attention to capture short-range dependencies. Our experiments on natural language modeling and generation tasks show that a decoder-only MASFormer model of 1.3B parameters can achieve competitive performance to vanilla transformers with full attention while significantly reducing computational cost (up to 75%). Additionally, we investigate the effectiveness of continual training with long sequence data and how sequence length impacts downstream generation performance, which may be of independent interest.  ( 2 min )
    How a student becomes a teacher: learning and forgetting through Spectral methods. (arXiv:2310.12612v1 [cs.LG])
    In theoretical ML, the teacher-student paradigm is often employed as an effective metaphor for real-life tuition. The above scheme proves particularly relevant when the student network is overparameterized as compared to the teacher network. Under these operating conditions, it is tempting to speculate that the student ability to handle the given task could be eventually stored in a sub-portion of the whole network. This latter should be to some extent reminiscent of the frozen teacher structure, according to suitable metrics, while being approximately invariant across different architectures of the student candidate network. Unfortunately, state-of-the-art conventional learning techniques could not help in identifying the existence of such an invariant subnetwork, due to the inherent degree of non-convexity that characterizes the examined problem. In this work, we take a leap forward by proposing a radically different optimization scheme which builds on a spectral representation of the linear transfer of information between layers. The gradient is hence calculated with respect to both eigenvalues and eigenvectors with negligible increase in terms of computational and complexity load, as compared to standard training algorithms. Working in this framework, we could isolate a stable student substructure, that mirrors the true complexity of the teacher in terms of computing neurons, path distribution and topological attributes. When pruning unimportant nodes of the trained student, as follows a ranking that reflects the optimized eigenvalues, no degradation in the recorded performance is seen above a threshold that corresponds to the effective teacher size. The observed behavior can be pictured as a genuine second-order phase transition that bears universality traits.  ( 3 min )
    An Image is Worth Multiple Words: Learning Object Level Concepts using Multi-Concept Prompt Learning. (arXiv:2310.12274v1 [cs.CV])
    Textural Inversion, a prompt learning method, learns a singular embedding for a new "word" to represent image style and appearance, allowing it to be integrated into natural language sentences to generate novel synthesised images. However, identifying and integrating multiple object-level concepts within one scene poses significant challenges even when embeddings for individual concepts are attainable. This is further confirmed by our empirical tests. To address this challenge, we introduce a framework for Multi-Concept Prompt Learning (MCPL), where multiple new "words" are simultaneously learned from a single sentence-image pair. To enhance the accuracy of word-concept correlation, we propose three regularisation techniques: Attention Masking (AttnMask) to concentrate learning on relevant areas; Prompts Contrastive Loss (PromptCL) to separate the embeddings of different concepts; and Bind adjective (Bind adj.) to associate new "words" with known words. We evaluate via image generation, editing, and attention visualisation with diverse images. Extensive quantitative comparisons demonstrate that our method can learn more semantically disentangled concepts with enhanced word-concept correlation. Additionally, we introduce a novel dataset and evaluation protocol tailored for this new task of learning object-level concepts.  ( 2 min )
    No-Regret Learning in Bilateral Trade via Global Budget Balance. (arXiv:2310.12370v1 [cs.GT])
    Bilateral trade revolves around the challenge of facilitating transactions between two strategic agents -- a seller and a buyer -- both of whom have a private valuations for the item. We study the online version of the problem, in which at each time step a new seller and buyer arrive. The learner's task is to set a price for each agent, without any knowledge about their valuations. The sequence of sellers and buyers is chosen by an oblivious adversary. In this setting, known negative results rule out the possibility of designing algorithms with sublinear regret when the learner has to guarantee budget balance for each iteration. In this paper, we introduce the notion of global budget balance, which requires the agent to be budget balance only over the entire time horizon. By requiring global budget balance, we provide the first no-regret algorithms for bilateral trade with adversarial inputs under various feedback models. First, we show that in the full-feedback model the learner can guarantee $\tilde{O}(\sqrt{T})$ regret against the best fixed prices in hindsight, which is order-wise optimal. Then, in the case of partial feedback models, we provide an algorithm guaranteeing a $\tilde{O}(T^{3/4})$ regret upper bound with one-bit feedback, which we complement with a nearly-matching lower bound. Finally, we investigate how these results vary when measuring regret using an alternative benchmark.  ( 2 min )
    Automated Repair of Declarative Software Specifications in the Era of Large Language Models. (arXiv:2310.12425v1 [cs.SE])
    The growing adoption of declarative software specification languages, coupled with their inherent difficulty in debugging, has underscored the need for effective and automated repair techniques applicable to such languages. Researchers have recently explored various methods to automatically repair declarative software specifications, such as template-based repair, feedback-driven iterative repair, and bounded exhaustive approaches. The latest developments in large language models provide new opportunities for the automatic repair of declarative specifications. In this study, we assess the effectiveness of utilizing OpenAI's ChatGPT to repair software specifications written in the Alloy declarative language. Unlike imperative languages, specifications in Alloy are not executed but rather translated into logical formulas and evaluated using backend constraint solvers to identify specification instances and counterexamples to assertions. Our evaluation focuses on ChatGPT's ability to improve the correctness and completeness of Alloy declarative specifications through automatic repairs. We analyze the results produced by ChatGPT and compare them with those of leading automatic Alloy repair methods. Our study revealed that while ChatGPT falls short in comparison to existing techniques, it was able to successfully repair bugs that no other technique could address. Our analysis also identified errors in ChatGPT's generated repairs, including improper operator usage, type errors, higher-order logic misuse, and relational arity mismatches. Additionally, we observed instances of hallucinations in ChatGPT-generated repairs and inconsistency in its results. Our study provides valuable insights for software practitioners, researchers, and tool builders considering ChatGPT for declarative specification repairs.  ( 3 min )
    Classification-Aided Robust Multiple Target Tracking Using Neural Enhanced Message Passing. (arXiv:2310.12407v1 [cs.LG])
    We address the challenge of tracking an unknown number of targets in strong clutter environments using measurements from a radar sensor. Leveraging the range-Doppler spectra information, we identify the measurement classes, which serve as additional information to enhance clutter rejection and data association, thus bolstering the robustness of target tracking. We first introduce a novel neural enhanced message passing approach, where the beliefs obtained by the unified message passing are fed into the neural network as additional information. The output beliefs are then utilized to refine the original beliefs. Then, we propose a classification-aided robust multiple target tracking algorithm, employing the neural enhanced message passing technique. This algorithm is comprised of three modules: a message-passing module, a neural network module, and a Dempster-Shafer module. The message-passing module is used to represent the statistical model by the factor graph and infers target kinematic states, visibility states, and data associations based on the spatial measurement information. The neural network module is employed to extract features from range-Doppler spectra and derive beliefs on whether a measurement is target-generated or clutter-generated. The Dempster-Shafer module is used to fuse the beliefs obtained from both the factor graph and the neural network. As a result, our proposed algorithm adopts a model-and-data-driven framework, effectively enhancing clutter suppression and data association, leading to significant improvements in multiple target tracking performance. We validate the effectiveness of our approach using both simulated and real data scenarios, demonstrating its capability to handle challenging tracking scenarios in practical radar applications.  ( 3 min )
    Safe RLHF: Safe Reinforcement Learning from Human Feedback. (arXiv:2310.12773v1 [cs.AI])
    With the development of large language models (LLMs), striking a balance between the performance and safety of AI systems has never been more critical. However, the inherent tension between the objectives of helpfulness and harmlessness presents a significant challenge during LLM training. To address this issue, we propose Safe Reinforcement Learning from Human Feedback (Safe RLHF), a novel algorithm for human value alignment. Safe RLHF explicitly decouples human preferences regarding helpfulness and harmlessness, effectively avoiding the crowdworkers' confusion about the tension and allowing us to train separate reward and cost models. We formalize the safety concern of LLMs as an optimization task of maximizing the reward function while satisfying specified cost constraints. Leveraging the Lagrangian method to solve this constrained problem, Safe RLHF dynamically adjusts the balance between the two objectives during fine-tuning. Through a three-round fine-tuning using Safe RLHF, we demonstrate a superior ability to mitigate harmful responses while enhancing model performance compared to existing value-aligned algorithms. Experimentally, we fine-tuned the Alpaca-7B using Safe RLHF and aligned it with collected human preferences, significantly improving its helpfulness and harmlessness according to human evaluations.  ( 2 min )
    Fast Parameter Inference on Pulsar Timing Arrays with Normalizing Flows. (arXiv:2310.12209v1 [astro-ph.IM])
    Pulsar timing arrays (PTAs) perform Bayesian posterior inference with expensive MCMC methods. Given a dataset of ~10-100 pulsars and O(10^3) timing residuals each, producing a posterior distribution for the stochastic gravitational wave background (SGWB) can take days to a week. The computational bottleneck arises because the likelihood evaluation required for MCMC is extremely costly when considering the dimensionality of the search space. Fortunately, generating simulated data is fast, so modern simulation-based inference techniques can be brought to bear on the problem. In this paper, we demonstrate how conditional normalizing flows trained on simulated data can be used for extremely fast and accurate estimation of the SGWB posteriors, reducing the sampling time from weeks to a matter of seconds.  ( 2 min )
    MuseGNN: Interpretable and Convergent Graph Neural Network Layers at Scale. (arXiv:2310.12457v1 [cs.LG])
    Among the many variants of graph neural network (GNN) architectures capable of modeling data with cross-instance relations, an important subclass involves layers designed such that the forward pass iteratively reduces a graph-regularized energy function of interest. In this way, node embeddings produced at the output layer dually serve as both predictive features for solving downstream tasks (e.g., node classification) and energy function minimizers that inherit desirable inductive biases and interpretability. However, scaling GNN architectures constructed in this way remains challenging, in part because the convergence of the forward pass may involve models with considerable depth. To tackle this limitation, we propose a sampling-based energy function and scalable GNN layers that iteratively reduce it, guided by convergence guarantees in certain settings. We also instantiate a full GNN architecture based on these designs, and the model achieves competitive accuracy and scalability when applied to the largest publicly-available node classification benchmark exceeding 1TB in size.  ( 2 min )
    Closed-Form Diffusion Models. (arXiv:2310.12395v1 [cs.LG])
    Score-based generative models (SGMs) sample from a target distribution by iteratively transforming noise using the score function of the perturbed target. For any finite training set, this score function can be evaluated in closed form, but the resulting SGM memorizes its training data and does not generate novel samples. In practice, one approximates the score by training a neural network via score-matching. The error in this approximation promotes generalization, but neural SGMs are costly to train and sample, and the effective regularization this error provides is not well-understood theoretically. In this work, we instead explicitly smooth the closed-form score to obtain an SGM that generates novel samples without training. We analyze our model and propose an efficient nearest-neighbor-based estimator of its score function. Using this estimator, our method achieves sampling times competitive with neural SGMs while running on consumer-grade CPUs.  ( 2 min )
    Exploring Graph Neural Networks for Indian Legal Judgment Prediction. (arXiv:2310.12800v1 [cs.LG])
    The burdensome impact of a skewed judges-to-cases ratio on the judicial system manifests in an overwhelming backlog of pending cases alongside an ongoing influx of new ones. To tackle this issue and expedite the judicial process, the proposition of an automated system capable of suggesting case outcomes based on factual evidence and precedent from past cases gains significance. This research paper centres on developing a graph neural network-based model to address the Legal Judgment Prediction (LJP) problem, recognizing the intrinsic graph structure of judicial cases and making it a binary node classification problem. We explored various embeddings as model features, while nodes such as time nodes and judicial acts were added and pruned to evaluate the model's performance. The study is done while considering the ethical dimension of fairness in these predictions, considering gender and name biases. A link prediction task is also conducted to assess the model's proficiency in anticipating connections between two specified nodes. By harnessing the capabilities of graph neural networks and incorporating fairness analyses, this research aims to contribute insights towards streamlining the adjudication process, enhancing judicial efficiency, and fostering a more equitable legal landscape, ultimately alleviating the strain imposed by mounting case backlogs. Our best-performing model with XLNet pre-trained embeddings as its features gives the macro F1 score of 75% for the LJP task. For link prediction, the same set of features is the best performing giving ROC of more than 80%
    Generative modeling, design and analysis of spider silk protein sequences for enhanced mechanical properties. (arXiv:2309.10170v1 [cond-mat.mtrl-sci] CROSS LISTED)
    Spider silks are remarkable materials characterized by superb mechanical properties such as strength, extensibility and lightweightedness. Yet, to date, limited models are available to fully explore sequence-property relationships for analysis and design. Here we propose a custom generative large-language model to enable design of novel spider silk protein sequences to meet complex combinations of target mechanical properties. The model, pretrained on a large set of protein sequences, is fine-tuned on ~1,000 major ampullate spidroin (MaSp) sequences for which associated fiber-level mechanical properties exist, to yield an end-to-end forward and inverse generative strategy. Performance is assessed through: (1), a novelty analysis and protein type classification for generated spidroin sequences through BLAST searches, (2) property evaluation and comparison with similar sequences, (3) comparison of molecular structures, as well as, and (4) a detailed sequence motif analyses. We generate silk sequences with property combinations that do not exist in nature, and develop a deep understanding the mechanistic roles of sequence patterns in achieving overarching key mechanical properties (elastic modulus, strength, toughness, failure strain). The model provides an efficient approach to expand the silkome dataset, facilitating further sequence-structure analyses of silks, and establishes a foundation for synthetic silk design and optimization.
    TabuLa: Harnessing Language Models for Tabular Data Synthesis. (arXiv:2310.12746v1 [cs.LG])
    Given the ubiquitous use of tabular data in industries and the growing concerns in data privacy and security, tabular data synthesis emerges as a critical research area. The recent state-of-the-art methods show that large language models (LLMs) can be adopted to generate realistic tabular data. As LLMs pre-process tabular data as full text, they have the advantage of avoiding the curse of dimensionality associated with one-hot encoding high-dimensional data. However, their long training time and limited re-usability on new tasks prevent them from replacing exiting tabular generative models. In this paper, we propose Tabula, a tabular data synthesizer based on the language model structure. Through Tabula, we demonstrate the inherent limitation of employing pre-trained language models designed for natural language processing (NLP) in the context of tabular data synthesis. Our investigation delves into the development of a dedicated foundational model tailored specifically for tabular data synthesis. Additionally, we propose a token sequence compression strategy to significantly reduce training time while preserving the quality of synthetic data. Extensive experiments on six datasets demonstrate that using a language model structure without loading the well-trained model weights yields a better starting model for tabular data synthesis. Moreover, the Tabula model, previously trained on other tabular data, serves as an excellent foundation model for new tabular data synthesis tasks. Additionally, the token sequence compression method substantially reduces the model's training time. Results show that Tabula averagely reduces 46.2% training time per epoch comparing to current LLMs-based state-of-the-art algorithm and consistently achieves even higher synthetic data utility.
    The Power of Populations in Decentralized Learning Dynamics. (arXiv:2306.08670v2 [cs.LG] UPDATED)
    We study a distributed multi-armed bandit setting among a population of $n$ memory-constrained nodes in the gossip model: at each round, every node locally adopts one of $m$ arms, observes a reward drawn from the arm's (adversarially chosen) distribution, and then communicates with a randomly sampled neighbor, exchanging information to determine its policy in the next round. We introduce and analyze several families of dynamics for this task that are decentralized: each node's decision is entirely local and depends only on its most recently obtained reward and that of the neighbor it sampled. We show a connection between the global evolution of these decentralized dynamics with a certain class of "zero-sum" multiplicative weights update algorithms, and we develop a general framework for analyzing the population-level regret of these natural protocols. Using this framework, we derive sublinear regret bounds under a wide range of parameter regimes (i.e., the size of the population and number of arms) for both the stationary reward setting (where the mean of each arm's distribution is fixed over time) and the adversarial reward setting (where means can vary over time). Further, we show that these protocols can approximately optimize convex functions over the simplex when the reward distributions are generated from a stochastic gradient oracle.
    Adaptive Pairwise Encodings for Link Prediction. (arXiv:2310.11009v2 [cs.LG] UPDATED)
    Link prediction is a common task on graph-structured data that has seen applications in a variety of domains. Classically, hand-crafted heuristics were used for this task. Heuristic measures are chosen such that they correlate well with the underlying factors related to link formation. In recent years, a new class of methods has emerged that combines the advantages of message-passing neural networks (MPNN) and heuristics methods. These methods perform predictions by using the output of an MPNN in conjunction with a "pairwise encoding" that captures the relationship between nodes in the candidate link. They have been shown to achieve strong performance on numerous datasets. However, current pairwise encodings often contain a strong inductive bias, using the same underlying factors to classify all links. This limits the ability of existing methods to learn how to properly classify a variety of different links that may form from different factors. To address this limitation, we propose a new method, LPFormer, which attempts to adaptively learn the pairwise encodings for each link. LPFormer models the link factors via an attention module that learns the pairwise encoding that exists between nodes by modeling multiple factors integral to link prediction. Extensive experiments demonstrate that LPFormer can achieve SOTA performance on numerous datasets while maintaining efficiency.
    Schema First! Learn Versatile Knowledge Graph Embeddings by Capturing Semantics with MASCHInE. (arXiv:2306.03659v2 [cs.AI] UPDATED)
    Knowledge graph embedding models (KGEMs) have gained considerable traction in recent years. These models learn a vector representation of knowledge graph entities and relations, a.k.a. knowledge graph embeddings (KGEs). Learning versatile KGEs is desirable as it makes them useful for a broad range of tasks. However, KGEMs are usually trained for a specific task, which makes their embeddings task-dependent. In parallel, the widespread assumption that KGEMs actually create a semantic representation of the underlying entities and relations (e.g., project similar entities closer than dissimilar ones) has been challenged. In this work, we design heuristics for generating protographs -- small, modified versions of a KG that leverage RDF/S information. The learnt protograph-based embeddings are meant to encapsulate the semantics of a KG, and can be leveraged in learning KGEs that, in turn, also better capture semantics. Extensive experiments on various evaluation benchmarks demonstrate the soundness of this approach, which we call Modular and Agnostic SCHema-based Integration of protograph Embeddings (MASCHInE). In particular, MASCHInE helps produce more versatile KGEs that yield substantially better performance for entity clustering and node classification tasks. For link prediction, using MASCHinE substantially increases the number of semantically valid predictions with equivalent rank-based performance.
    Machine Learning Based Compensation for Inconsistencies in Knitted Force Sensors. (arXiv:2306.12129v2 [eess.SY] UPDATED)
    Knitted sensors frequently suffer from inconsistencies due to innate effects such as offset, relaxation, and drift. These properties, in combination, make it challenging to reliably map from sensor data to physical actuation. In this paper, we demonstrate a method for counteracting this by applying processing using a minimal artificial neural network (ANN) in combination with straightforward pre-processing. We apply a number of exponential smoothing filters on a re-sampled sensor signal, to produce features that preserve different levels of historical sensor data and, in combination, represent an adequate state of previous sensor actuation. By training a three-layer ANN with a total of 8 neurons, we manage to significantly improve the mapping between sensor reading and actuation force. Our findings also show that our technique translates to sensors of reasonably different composition in terms of material and structure, and it can furthermore be applied to related physical features such as strain.
    SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training. (arXiv:2310.02227v2 [cs.LG] UPDATED)
    In an era where symbolic mathematical equations are indispensable for modeling complex natural phenomena, scientific inquiry often involves collecting observations and translating them into mathematical expressions. Recently, deep learning has emerged as a powerful tool for extracting insights from data. However, existing models typically specialize in either numeric or symbolic domains, and are usually trained in a supervised manner tailored to specific tasks. This approach neglects the substantial benefits that could arise from a task-agnostic unified understanding between symbolic equations and their numeric counterparts. To bridge the gap, we introduce SNIP, a Symbolic-Numeric Integrated Pre-training, which employs joint contrastive learning between symbolic and numeric domains, enhancing their mutual similarities in the pre-trained embeddings. By performing latent space analysis, we observe that SNIP provides cross-domain insights into the representations, revealing that symbolic supervision enhances the embeddings of numeric data and vice versa. We evaluate SNIP across diverse tasks, including symbolic-to-numeric mathematical property prediction and numeric-to-symbolic equation discovery, commonly known as symbolic regression. Results show that SNIP effectively transfers to various tasks, consistently outperforming fully supervised baselines and competing strongly with established task-specific methods, especially in few-shot learning scenarios where available data is limited.
    Parallel Bayesian Optimization Using Satisficing Thompson Sampling for Time-Sensitive Black-Box Optimization. (arXiv:2310.12526v1 [cs.LG])
    Bayesian optimization (BO) is widely used for black-box optimization problems, and have been shown to perform well in various real-world tasks. However, most of the existing BO methods aim to learn the optimal solution, which may become infeasible when the parameter space is extremely large or the problem is time-sensitive. In these contexts, switching to a satisficing solution that requires less information can result in better performance. In this work, we focus on time-sensitive black-box optimization problems and propose satisficing Thompson sampling-based parallel Bayesian optimization (STS-PBO) approaches, including synchronous and asynchronous versions. We shift the target from an optimal solution to a satisficing solution that is easier to learn. The rate-distortion theory is introduced to construct a loss function that balances the amount of information that needs to be learned with sub-optimality, and the Blahut-Arimoto algorithm is adopted to compute the target solution that reaches the minimum information rate under the distortion limit at each step. Both discounted and undiscounted Bayesian cumulative regret bounds are theoretically derived for the proposed STS-PBO approaches. The effectiveness of the proposed methods is demonstrated on a fast-charging design problem of Lithium-ion batteries. The results are accordant with theoretical analyses, and show that our STS-PBO methods outperform both sequential counterparts and parallel BO with traditional Thompson sampling in both synchronous and asynchronous settings.
    Constrained Reweighting of Distributions: an Optimal Transport Approach. (arXiv:2310.12447v1 [stat.ML])
    We commonly encounter the problem of identifying an optimally weight adjusted version of the empirical distribution of observed data, adhering to predefined constraints on the weights. Such constraints often manifest as restrictions on the moments, tail behaviour, shapes, number of modes, etc., of the resulting weight adjusted empirical distribution. In this article, we substantially enhance the flexibility of such methodology by introducing a nonparametrically imbued distributional constraints on the weights, and developing a general framework leveraging the maximum entropy principle and tools from optimal transport. The key idea is to ensure that the maximum entropy weight adjusted empirical distribution of the observed data is close to a pre-specified probability distribution in terms of the optimal transport metric while allowing for subtle departures. The versatility of the framework is demonstrated in the context of three disparate applications where data re-weighting is warranted to satisfy side constraints on the optimization problem at the heart of the statistical task: namely, portfolio allocation, semi-parametric inference for complex surveys, and ensuring algorithmic fairness in machine learning algorithms.
    Transformers for scientific data: a pedagogical review for astronomers. (arXiv:2310.12069v2 [astro-ph.IM] UPDATED)
    The deep learning architecture associated with ChatGPT and related generative AI products is known as transformers. Initially applied to Natural Language Processing, transformers and the self-attention mechanism they exploit have gained widespread interest across the natural sciences. The goal of this pedagogical and informal review is to introduce transformers to scientists. The review includes the mathematics underlying the attention mechanism, a description of the original transformer architecture, and a section on applications to time series and imaging data in astronomy. We include a Frequently Asked Questions section for readers who are curious about generative AI or interested in getting started with transformers for their research problem.
    Protein 3D Graph Structure Learning for Robust Structure-based Protein Property Prediction. (arXiv:2310.11466v2 [cs.LG] UPDATED)
    Protein structure-based property prediction has emerged as a promising approach for various biological tasks, such as protein function prediction and sub-cellular location estimation. The existing methods highly rely on experimental protein structure data and fail in scenarios where these data are unavailable. Predicted protein structures from AI tools (e.g., AlphaFold2) were utilized as alternatives. However, we observed that current practices, which simply employ accurately predicted structures during inference, suffer from notable degradation in prediction accuracy. While similar phenomena have been extensively studied in general fields (e.g., Computer Vision) as model robustness, their impact on protein property prediction remains unexplored. In this paper, we first investigate the reason behind the performance decrease when utilizing predicted structures, attributing it to the structure embedding bias from the perspective of structure representation learning. To study this problem, we identify a Protein 3D Graph Structure Learning Problem for Robust Protein Property Prediction (PGSL-RP3), collect benchmark datasets, and present a protein Structure embedding Alignment Optimization framework (SAO) to mitigate the problem of structure embedding bias between the predicted and experimental protein structures. Extensive experiments have shown that our framework is model-agnostic and effective in improving the property prediction of both predicted structures and experimental structures. The benchmark datasets and codes will be released to benefit the community.
    Efficient Dataset Distillation through Alignment with Smooth and High-Quality Expert Trajectories. (arXiv:2310.10541v1 [cs.CV] CROSS LISTED)
    Training a large and state-of-the-art machine learning model typically necessitates the use of large-scale datasets, which, in turn, makes the training and parameter-tuning process expensive and time-consuming. Some researchers opt to distil information from real-world datasets into tiny and compact synthetic datasets while maintaining their ability to train a well-performing model, hence proposing a data-efficient method known as Dataset Distillation (DD). Despite recent progress in this field, existing methods still underperform and cannot effectively replace large datasets. In this paper, unlike previous methods that focus solely on improving the efficacy of student distillation, we are the first to recognize the important interplay between expert and student. We argue the significant impact of expert smoothness when employing more potent expert trajectories in subsequent dataset distillation. Based on this, we introduce the integration of clipping loss and gradient penalty to regulate the rate of parameter changes in expert trajectories. Furthermore, in response to the sensitivity exhibited towards randomly initialized variables during distillation, we propose representative initialization for synthetic dataset and balanced inner-loop loss. Finally, we present two enhancement strategies, namely intermediate matching loss and weight perturbation, to mitigate the potential occurrence of cumulative errors. We conduct extensive experiments on datasets of different scales, sizes, and resolutions. The results demonstrate that the proposed method significantly outperforms prior methods.
    The Kernel Density Integral Transformation. (arXiv:2309.10194v2 [stat.ML] UPDATED)
    Feature preprocessing continues to play a critical role when applying machine learning and statistical methods to tabular data. In this paper, we propose the use of the kernel density integral transformation as a feature preprocessing step. Our approach subsumes the two leading feature preprocessing methods as limiting cases: linear min-max scaling and quantile transformation. We demonstrate that, without hyperparameter tuning, the kernel density integral transformation can be used as a simple drop-in replacement for either method, offering protection from the weaknesses of each. Alternatively, with tuning of a single continuous hyperparameter, we frequently outperform both of these methods. Finally, we show that the kernel density transformation can be profitably applied to statistical data analysis, particularly in correlation analysis and univariate clustering.
    Microscaling Data Formats for Deep Learning. (arXiv:2310.10537v3 [cs.LG] UPDATED)
    Narrow bit-width data formats are key to reducing the computational and storage costs of modern deep learning applications. This paper evaluates Microscaling (MX) data formats that combine a per-block scaling factor with narrow floating-point and integer types for individual elements. MX formats balance the competing needs of hardware efficiency, model accuracy, and user friction. Empirical results on over two dozen benchmarks demonstrate practicality of MX data formats as a drop-in replacement for baseline FP32 for AI inference and training with low user friction. We also show the first instance of training generative language models at sub-8-bit weights, activations, and gradients with minimal accuracy loss and no modifications to the training recipe.
    Improving Generalization of Alignment with Human Preferences through Group Invariant Learning. (arXiv:2310.11971v2 [cs.LG] UPDATED)
    The success of AI assistants based on language models (LLMs) hinges crucially on Reinforcement Learning from Human Feedback (RLHF), which enables the generation of responses more aligned with human preferences. As universal AI assistants, there's a growing expectation for them to perform consistently across various domains. However, previous work shows that Reinforcement Learning (RL) often exploits shortcuts to attain high rewards and overlooks challenging samples. This focus on quick reward gains undermines both the stability in training and the model's ability to generalize to new, unseen data. In this work, we propose a novel approach that can learn a consistent policy via RL across various data groups or domains. Given the challenges associated with acquiring group annotations, our method automatically classifies data into different groups, deliberately maximizing performance variance. Then, we optimize the policy to perform well on challenging groups. Lastly, leveraging the established groups, our approach adaptively adjusts the exploration space, allocating more learning capacity to more challenging data and preventing the model from over-optimizing on simpler data. Experimental results indicate that our approach significantly enhances training stability and model generalization.
    When Rigidity Hurts: Soft Consistency Regularization for Probabilistic Hierarchical Time Series Forecasting. (arXiv:2310.11569v2 [cs.LG] UPDATED)
    Probabilistic hierarchical time-series forecasting is an important variant of time-series forecasting, where the goal is to model and forecast multivariate time-series that have underlying hierarchical relations. Most methods focus on point predictions and do not provide well-calibrated probabilistic forecasts distributions. Recent state-of-art probabilistic forecasting methods also impose hierarchical relations on point predictions and samples of distribution which does not account for coherency of forecast distributions. Previous works also silently assume that datasets are always consistent with given hierarchical relations and do not adapt to real-world datasets that show deviation from this assumption. We close both these gap and propose PROFHiT, which is a fully probabilistic hierarchical forecasting model that jointly models forecast distribution of entire hierarchy. PROFHiT uses a flexible probabilistic Bayesian approach and introduces a novel Distributional Coherency regularization to learn from hierarchical relations for entire forecast distribution that enables robust and calibrated forecasts as well as adapt to datasets of varying hierarchical consistency. On evaluating PROFHiT over wide range of datasets, we observed 41-88% better performance in accuracy and significantly better calibration. Due to modeling the coherency over full distribution, we observed that PROFHiT can robustly provide reliable forecasts even if up to 10% of input time-series data is missing where other methods' performance severely degrade by over 70%.
    Fed-GraB: Federated Long-tailed Learning with Self-Adjusting Gradient Balancer. (arXiv:2310.07587v2 [cs.LG] UPDATED)
    Data privacy and long-tailed distribution are the norms rather than the exception in many real-world tasks. This paper investigates a federated long-tailed learning (Fed-LT) task in which each client holds a locally heterogeneous dataset; if the datasets can be globally aggregated, they jointly exhibit a long-tailed distribution. Under such a setting, existing federated optimization and/or centralized long-tailed learning methods hardly apply due to challenges in (a) characterizing the global long-tailed distribution under privacy constraints and (b) adjusting the local learning strategy to cope with the head-tail imbalance. In response, we propose a method termed $\texttt{Fed-GraB}$, comprised of a Self-adjusting Gradient Balancer (SGB) module that re-weights clients' gradients in a closed-loop manner, based on the feedback of global long-tailed distribution evaluated by a Direct Prior Analyzer (DPA) module. Using $\texttt{Fed-GraB}$, clients can effectively alleviate the distribution drift caused by data heterogeneity during the model training process and obtain a global model with better performance on the minority classes while maintaining the performance of the majority classes. Extensive experiments demonstrate that $\texttt{Fed-GraB}$ achieves state-of-the-art performance on representative datasets such as CIFAR-10-LT, CIFAR-100-LT, ImageNet-LT, and iNaturalist.
    ACES: Generating Diverse Programming Puzzles with Autotelic Language Models and Semantic Descriptors. (arXiv:2310.10692v2 [cs.LG] UPDATED)
    Finding and selecting new and interesting problems to solve is at the heart of curiosity, science and innovation. We here study automated problem generation in the context of the open-ended space of python programming puzzles. Existing generative models often aim at modeling a reference distribution without any explicit diversity optimization. Other methods explicitly optimizing for diversity do so either in limited hand-coded representation spaces or in uninterpretable learned embedding spaces that may not align with human perceptions of interesting variations. With ACES (Autotelic Code Exploration via Semantic descriptors), we introduce a new autotelic generation method that leverages semantic descriptors produced by a large language model (LLM) to directly optimize for interesting diversity, as well as few-shot-based generation. Each puzzle is labeled along 10 dimensions, each capturing a programming skill required to solve it. ACES generates and pursues novel and feasible goals to explore that abstract semantic space, slowly discovering a diversity of solvable programming puzzles in any given run. Across a set of experiments, we show that ACES discovers a richer diversity of puzzles than existing diversity-maximizing algorithms as measured across a range of diversity metrics. We further study whether and in which conditions this diversity can translate into the successful training of puzzle solving models.
    URL: A Representation Learning Benchmark for Transferable Uncertainty Estimates. (arXiv:2307.03810v2 [cs.LG] UPDATED)
    Representation learning has significantly driven the field to develop pretrained models that can act as a valuable starting point when transferring to new datasets. With the rising demand for reliable machine learning and uncertainty quantification, there is a need for pretrained models that not only provide embeddings but also transferable uncertainty estimates. To guide the development of such models, we propose the Uncertainty-aware Representation Learning (URL) benchmark. Besides the transferability of the representations, it also measures the zero-shot transferability of the uncertainty estimate using a novel metric. We apply URL to evaluate eleven uncertainty quantifiers that are pretrained on ImageNet and transferred to eight downstream datasets. We find that approaches that focus on the uncertainty of the representation itself or estimate the prediction risk directly outperform those that are based on the probabilities of upstream classes. Yet, achieving transferable uncertainty quantification remains an open challenge. Our findings indicate that it is not necessarily in conflict with traditional representation learning goals. Code is provided under https://github.com/mkirchhof/url .
    On the power of graph neural networks and the role of the activation function. (arXiv:2307.04661v2 [cs.LG] UPDATED)
    In this article we present new results about the expressivity of Graph Neural Networks (GNNs). We prove that for any GNN with piecewise polynomial activations, whose architecture size does not grow with the graph input sizes, there exists a pair of non-isomorphic rooted trees of depth two such that the GNN cannot distinguish their root vertex up to an arbitrary number of iterations. The proof relies on tools from the algebra of symmetric polynomials. In contrast, it was already known that unbounded GNNs (those whose size is allowed to change with the graph sizes) with piecewise polynomial activations can distinguish these vertices in only two iterations. Our results imply a strict separation between bounded and unbounded size GNNs, answering an open question formulated by [Grohe, 2021]. We next prove that if one allows activations that are not piecewise polynomial, then in two iterations a single neuron perceptron can distinguish the root vertices of any pair of nonisomorphic trees of depth two (our results hold for activations like the sigmoid, hyperbolic tan and others). This shows how the power of graph neural networks can change drastically if one changes the activation function of the neural networks. The proof of this result utilizes the Lindemann-Weierstrauss theorem from transcendental number theory.
    In-Context Pretraining: Language Modeling Beyond Document Boundaries. (arXiv:2310.10638v2 [cs.CL] UPDATED)
    Large language models (LMs) are currently trained to predict tokens given document prefixes, enabling them to directly perform long-form generation and prompting-style tasks which can be reduced to document completion. Existing pretraining pipelines train LMs by concatenating random sets of short documents to create input contexts but the prior documents provide no signal for predicting the next document. We instead present In-Context Pretraining, a new approach where language models are pretrained on a sequence of related documents, thereby explicitly encouraging them to read and reason across document boundaries. We can do In-Context Pretraining by simply changing the document ordering so that each context contains related documents, and directly applying existing pretraining pipelines. However, this document sorting problem is challenging. There are billions of documents and we would like the sort to maximize contextual similarity for every document without repeating any data. To do this, we introduce approximate algorithms for finding related documents with efficient nearest neighbor search and constructing coherent input contexts with a graph traversal algorithm. Our experiments show In-Context Pretraining offers a simple and scalable approach to significantly enhance LMs'performance: we see notable improvements in tasks that require more complex contextual reasoning, including in-context learning (+8%), reading comprehension (+15%), faithfulness to previous contexts (+16%), long-context reasoning (+5%), and retrieval augmentation (+9%).
    HGCVAE: Integrating Generative and Contrastive Learning for Heterogeneous Graph Learning. (arXiv:2310.11102v3 [cs.LG] UPDATED)
    Generative self-supervised learning (SSL) has exhibited significant potential and garnered increasing interest in graph learning. In this study, we aim to explore the problem of generative SSL in the context of heterogeneous graph learning (HGL). The previous SSL approaches for heterogeneous graphs have primarily relied on contrastive learning, necessitating the design of complex views to capture heterogeneity. However, existing generative SSL methods have not fully leveraged the capabilities of generative models to address the challenges of HGL. In this paper, we present HGCVAE, a novel contrastive variational graph auto-encoder that liberates HGL from the burden of intricate heterogeneity capturing. Instead of focusing on complicated heterogeneity, HGCVAE harnesses the full potential of generative SSL. HGCVAE innovatively consolidates contrastive learning with generative SSL, introducing several key innovations. Firstly, we employ a progressive mechanism to generate high-quality hard negative samples for contrastive learning, utilizing the power of variational inference. Additionally, we present a dynamic mask strategy to ensure effective and stable learning. Moreover, we propose an enhanced scaled cosine error as the criterion for better attribute reconstruction. As an initial step in combining generative and contrastive SSL, HGCVAE achieves remarkable results compared to various state-of-the-art baselines, confirming its superiority.
    Deep Probabilistic Movement Primitives with a Bayesian Aggregator. (arXiv:2307.05141v2 [cs.RO] UPDATED)
    Movement primitives are trainable parametric models that reproduce robotic movements starting from a limited set of demonstrations. Previous works proposed simple linear models that exhibited high sample efficiency and generalization power by allowing temporal modulation of movements (reproducing movements faster or slower), blending (merging two movements into one), via-point conditioning (constraining a movement to meet some particular via-points) and context conditioning (generation of movements based on an observed variable, e.g., position of an object). Previous works have proposed neural network-based motor primitive models, having demonstrated their capacity to perform tasks with some forms of input conditioning or time-modulation representations. However, there has not been a single unified deep motor primitive's model proposed that is capable of all previous operations, limiting neural motor primitive's potential applications. This paper proposes a deep movement primitive architecture that encodes all the operations above and uses a Bayesian context aggregator that allows a more sound context conditioning and blending. Our results demonstrate our approach can scale to reproduce complex motions on a larger variety of input choices compared to baselines while maintaining operations of linear movement primitives provide.
    CONFLATOR: Incorporating Switching Point based Rotatory Positional Encodings for Code-Mixed Language Modeling. (arXiv:2309.05270v2 [cs.CL] UPDATED)
    The mixing of two or more languages is called Code-Mixing (CM). CM is a social norm in multilingual societies. Neural Language Models (NLMs) like transformers have been effective on many NLP tasks. However, NLM for CM is an under-explored area. Though transformers are capable and powerful, they cannot always encode positional information since they are non-recurrent. Therefore, to enrich word information and incorporate positional information, positional encoding is defined. We hypothesize that Switching Points (SPs), i.e., junctions in the text where the language switches (L1 -> L2 or L2 -> L1), pose a challenge for CM Language Models (LMs), and hence give special emphasis to SPs in the modeling process. We experiment with several positional encoding mechanisms and show that rotatory positional encodings along with switching point information yield the best results. We introduce CONFLATOR: a neural language modeling approach for code-mixed languages. CONFLATOR tries to learn to emphasize switching points using smarter positional encoding, both at unigram and bigram levels. CONFLATOR outperforms the state-of-the-art on two tasks based on code-mixed Hindi and English (Hinglish): (i) sentiment analysis and (ii) machine translation.
    Evaluating Superhuman Models with Consistency Checks. (arXiv:2306.09983v3 [cs.LG] UPDATED)
    If machine learning models were to achieve superhuman abilities at various reasoning or decision-making tasks, how would we go about evaluating such models, given that humans would necessarily be poor proxies for ground truth? In this paper, we propose a framework for evaluating superhuman models via consistency checks. Our premise is that while the correctness of superhuman decisions may be impossible to evaluate, we can still surface mistakes if the model's decisions fail to satisfy certain logical, human-interpretable rules. We instantiate our framework on three tasks where correctness of decisions is hard to evaluate due to either superhuman model abilities, or to otherwise missing ground truth: evaluating chess positions, forecasting future events, and making legal judgments. We show that regardless of a model's (possibly superhuman) performance on these tasks, we can discover logical inconsistencies in decision making. For example: a chess engine assigning opposing valuations to semantically identical boards; GPT-4 forecasting that sports records will evolve non-monotonically over time; or an AI judge assigning bail to a defendant only after we add a felony to their criminal record.
    Provable Guarantees for Neural Networks via Gradient Feature Learning. (arXiv:2310.12408v1 [cs.LG])
    Neural networks have achieved remarkable empirical performance, while the current theoretical analysis is not adequate for understanding their success, e.g., the Neural Tangent Kernel approach fails to capture their key feature learning ability, while recent analyses on feature learning are typically problem-specific. This work proposes a unified analysis framework for two-layer networks trained by gradient descent. The framework is centered around the principle of feature learning from gradients, and its effectiveness is demonstrated by applications in several prototypical problems, such as mixtures of Gaussians and parity functions. The framework also sheds light on interesting network learning phenomena such as feature learning beyond kernels and the lottery ticket hypothesis.
    Symmetric Neural-Collapse Representations with Supervised Contrastive Loss: The Impact of ReLU and Batching. (arXiv:2306.07960v2 [cs.LG] UPDATED)
    Supervised contrastive loss (SCL) is a competitive and often superior alternative to the cross-entropy loss for classification. While prior studies have demonstrated that both losses yield symmetric training representations under balanced data, this symmetry breaks under class imbalances. This paper presents an intriguing discovery: the introduction of a ReLU activation at the final layer effectively restores the symmetry in SCL-learned representations. We arrive at this finding analytically, by establishing that the global minimizers of an unconstrained features model with SCL loss and entry-wise non-negativity constraints form an orthogonal frame. Extensive experiments conducted across various datasets, architectures, and imbalance scenarios corroborate our finding. Importantly, our experiments reveal that the inclusion of the ReLU activation restores symmetry without compromising test accuracy. This constitutes the first geometry characterization of SCL under imbalances. Additionally, our analysis and experiments underscore the pivotal role of batch selection strategies in representation geometry. By proving necessary and sufficient conditions for mini-batch choices that ensure invariant symmetric representations, we introduce batch-binding as an efficient strategy that guarantees these conditions hold.
    Automatic Prompt Optimization with "Gradient Descent" and Beam Search. (arXiv:2305.03495v2 [cs.CL] UPDATED)
    Large Language Models (LLMs) have shown impressive performance as general purpose agents, but their abilities remain highly dependent on prompts which are hand written with onerous trial-and-error effort. We propose a simple and nonparametric solution to this problem, Automatic Prompt Optimization (APO), which is inspired by numerical gradient descent to automatically improve prompts, assuming access to training data and an LLM API. The algorithm uses minibatches of data to form natural language "gradients" that criticize the current prompt. The gradients are then "propagated" into the prompt by editing the prompt in the opposite semantic direction of the gradient. These gradient descent steps are guided by a beam search and bandit selection procedure which significantly improves algorithmic efficiency. Preliminary results across three benchmark NLP tasks and the novel problem of LLM jailbreak detection suggest that Automatic Prompt Optimization can outperform prior prompt editing techniques and improve an initial prompt's performance by up to 31%, by using data to rewrite vague task descriptions into more precise annotation instructions.
    On the Design Fundamentals of Diffusion Models: A Survey. (arXiv:2306.04542v3 [cs.LG] UPDATED)
    Diffusion models are generative models, which gradually add and remove noise to learn the underlying distribution of training data for data generation. The components of diffusion models have gained significant attention with many design choices proposed. Existing reviews have primarily focused on higher-level solutions, thereby covering less on the design fundamentals of components. This study seeks to address this gap by providing a comprehensive and coherent review on component-wise design choices in diffusion models. Specifically, we organize this review according to their three key components, namely the forward process, the reverse process, and the sampling procedure. This allows us to provide a fine-grained perspective of diffusion models, benefiting future studies in the analysis of individual components, the applicability of design choices, and the implementation of diffusion models.
    Make Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning. (arXiv:2306.00477v4 [cs.CL] UPDATED)
    Parameter-efficient fine-tuning (PEFT) of pre-trained language models (PLMs) has emerged as a highly successful approach, with training only a small number of parameters without sacrificing performance and becoming the de-facto learning paradigm with the increasing size of PLMs. However, existing PEFT methods are not memory-efficient, because they still require caching most of the intermediate activations for the gradient calculation, akin to fine-tuning. One effective way to reduce the activation memory is to apply a reversible model, so the intermediate activations are not necessary to be cached and can be recomputed. Nevertheless, modifying a PLM to its reversible variant is not straightforward, since the reversible model has a distinct architecture from the currently released PLMs. In this paper, we first investigate what is a key factor for the success of existing PEFT methods, and realize that it's essential to preserve the PLM's starting point when initializing a PEFT method. With this finding, we propose memory-efficient fine-tuning (MEFT) that inserts adapters into a PLM, preserving the PLM's starting point and making it reversible without additional pre-training. We evaluate MEFT on the GLUE benchmark and five question-answering tasks with various backbones, BERT, RoBERTa, BART and OPT. MEFT significantly reduces the activation memory up to 84% of full fine-tuning with a negligible amount of trainable parameters. Moreover, MEFT achieves the same score on GLUE and a comparable score on the question-answering tasks as full fine-tuning. A similar finding is also observed for the image classification task.
    ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations. (arXiv:2306.08141v2 [cs.AI] UPDATED)
    As generative AI becomes more prevalent, it is important to study how human users interact with such models. In this work, we investigate how people use text-to-image models to generate desired target images. To study this interaction, we created ArtWhisperer, an online game where users are given a target image and are tasked with iteratively finding a prompt that creates a similar-looking image as the target. Through this game, we recorded over 50,000 human-AI interactions; each interaction corresponds to one text prompt created by a user and the corresponding generated image. The majority of these are repeated interactions where a user iterates to find the best prompt for their target image, making this a unique sequential dataset for studying human-AI collaborations. In an initial analysis of this dataset, we identify several characteristics of prompt interactions and user strategies. People submit diverse prompts and are able to discover a variety of text descriptions that generate similar images. Interestingly, prompt diversity does not decrease as users find better prompts. We further propose a new metric to quantify the steerability of AI using our dataset. We define steerability as the expected number of interactions required to adequately complete a task. We estimate this value by fitting a Markov chain for each target task and calculating the expected time to reach an adequate score in the Markov chain. We quantify and compare AI steerability across different types of target images and two different models, finding that images of cities and natural world images are more steerable than artistic and fantasy images. These findings provide insights into human-AI interaction behavior, present a concrete method of assessing AI steerability, and demonstrate the general utility of the ArtWhisperer dataset.
    Quasi Manhattan Wasserstein Distance. (arXiv:2310.12498v1 [cs.LG])
    The Quasi Manhattan Wasserstein Distance (QMWD) is a metric designed to quantify the dissimilarity between two matrices by combining elements of the Wasserstein Distance with specific transformations. It offers improved time and space complexity compared to the Manhattan Wasserstein Distance (MWD) while maintaining accuracy. QMWD is particularly advantageous for large datasets or situations with limited computational resources. This article provides a detailed explanation of QMWD, its computation, complexity analysis, and comparisons with WD and MWD.
    Detecting and Mitigating Algorithmic Bias in Binary Classification using Causal Modeling. (arXiv:2310.12421v1 [cs.LG])
    This paper proposes the use of causal modeling to detect and mitigate algorithmic bias. We provide a brief description of causal modeling and a general overview of our approach. We then use the Adult dataset, which is available for download from the UC Irvine Machine Learning Repository, to develop (1) a prediction model, which is treated as a black box, and (2) a causal model for bias mitigation. In this paper, we focus on gender bias and the problem of binary classification. We show that gender bias in the prediction model is statistically significant at the 0.05 level. We demonstrate the effectiveness of the causal model in mitigating gender bias by cross-validation. Furthermore, we show that the overall classification accuracy is improved slightly. Our novel approach is intuitive, easy-to-use, and can be implemented using existing statistical software tools such as "lavaan" in R. Hence, it enhances explainability and promotes trust.
    Online Resource Allocation in Episodic Markov Decision Processes. (arXiv:2305.10744v3 [cs.DS] UPDATED)
    This paper studies a long-term resource allocation problem over multiple periods where each period requires a multi-stage decision-making process. We formulate the problem as an online allocation problem in an episodic finite-horizon constrained Markov decision process with an unknown non-stationary transition function and stochastic non-stationary reward and resource consumption functions. We propose the observe-then-decide regime and improve the existing decide-then-observe regime, while the two settings differ in how the observations and feedback about the reward and resource consumption functions are given to the decision-maker. We develop an online dual mirror descent algorithm that achieves near-optimal regret bounds for both settings. For the observe-then-decide regime, we prove that the expected regret against the dynamic clairvoyant optimal policy is bounded by $\tilde O(\rho^{-1}{H^{3/2}}S\sqrt{AT})$ where $\rho\in(0,1)$ is the budget parameter, $H$ is the length of the horizon, $S$ and $A$ are the numbers of states and actions, and $T$ is the number of episodes. For the decide-then-observe regime, we show that the regret against the static optimal policy that has access to the mean reward and mean resource consumption functions is bounded by $\tilde O(\rho^{-1}{H^{3/2}}S\sqrt{AT})$ with high probability. We test the numerical efficiency of our method for a variant of the resource-constrained inventory management problem.
    Kepler: Robust Learning for Faster Parametric Query Optimization. (arXiv:2306.06798v2 [cs.DB] UPDATED)
    Most existing parametric query optimization (PQO) techniques rely on traditional query optimizer cost models, which are often inaccurate and result in suboptimal query performance. We propose Kepler, an end-to-end learning-based approach to PQO that demonstrates significant speedups in query latency over a traditional query optimizer. Central to our method is Row Count Evolution (RCE), a novel plan generation algorithm based on perturbations in the sub-plan cardinality space. While previous approaches require accurate cost models, we bypass this requirement by evaluating candidate plans via actual execution data and training an ML model to predict the fastest plan given parameter binding values. Our models leverage recent advances in neural network uncertainty in order to robustly predict faster plans while avoiding regressions in query performance. Experimentally, we show that Kepler achieves significant improvements in query runtime on multiple datasets on PostgreSQL.
    Connecting Multi-modal Contrastive Representations. (arXiv:2305.14381v2 [cs.LG] UPDATED)
    Multi-modal Contrastive Representation learning aims to encode different modalities into a semantically aligned shared space. This paradigm shows remarkable generalization ability on numerous downstream tasks across various modalities. However, the reliance on massive high-quality data pairs limits its further development on more modalities. This paper proposes a novel training-efficient method for learning MCR without paired data called Connecting Multi-modal Contrastive Representations (C-MCR). Specifically, given two existing MCRs pre-trained on (A, B) and (B, C) modality pairs, we project them to a new space and use the data from the overlapping modality B to aligning the two MCRs in the new space. Meanwhile, since the modality pairs (A, B) and (B, C) are already aligned within each MCR, the connection learned by overlapping modality can also be transferred to non-overlapping modality pair (A, C). To unleash the potential of C-MCR, we further introduce a semantic-enhanced inter- and intra-MCR connection method. We first enhance the semantic consistency and completion of embeddings across different modalities for more robust alignment. Then we utilize the inter-MCR alignment to establish the connection, and employ the intra-MCR alignment to better maintain the connection for inputs from non-overlapping modalities. To demonstrate the effectiveness of C-MCR, we connect CLIP and CLAP via texts to derive audio-visual representations, and integrate CLIP and ULIP via images for 3D-language representations. Remarkably, without using any paired data, C-MCR for audio-visual achieves state-of-the-art performance on audio-image retrieval, audio-visual source localization, and counterfactual audio-image recognition tasks. Furthermore, C-MCR for 3D-language also attains advanced zero-shot 3D point cloud classification accuracy on ModelNet40.
    Seeing double with a multifunctional reservoir computer. (arXiv:2305.05799v2 [math.DS] UPDATED)
    Multifunctional biological neural networks exploit multistability in order to perform multiple tasks without changing any network properties. Enabling artificial neural networks (ANNs) to obtain certain multistabilities in order to perform several tasks, where each task is related to a particular attractor in the network's state space, naturally has many benefits from a machine learning perspective. Given the association to multistability, in this paper we explore how the relationship between different attractors influences the ability of a reservoir computer (RC), which is a dynamical system in the form of an ANN, to achieve multifunctionality. We construct the `seeing double' problem to systematically study how a RC reconstructs a coexistence of attractors when there is an overlap between them. As the amount of overlap increases, we discover that for multifunctionality to occur, there is a critical dependence on a suitable choice of the spectral radius for the RC's internal network connections. A bifurcation analysis reveals how multifunctionality emerges and is destroyed as the RC enters a chaotic regime that can lead to chaotic itinerancy.
    The Adaptive $\tau$-Lasso: Robustness and Oracle Properties. (arXiv:2304.09310v2 [stat.ML] UPDATED)
    This paper introduces a new regularized version of the robust $\tau$-regression estimator for analyzing high-dimensional datasets subject to gross contamination in the response variables and covariates (explanatory variables). The resulting estimator, termed adaptive $\tau$-Lasso, is robust to outliers and high-leverage points. It also incorporates an adaptive $\ell_1$-norm penalty term, which enables the selection of relevant variables and reduces the bias associated with large true regression coefficients. More specifically, this adaptive $\ell_1$-norm penalty term assigns a weight to each regression coefficient. For a fixed number of predictors $p$, we show that the adaptive $\tau$-Lasso has the oracle property, ensuring both variable-selection consistency and asymptotic normality. Asymptotic normality applies only to the entries of the regression vector corresponding to the true support, assuming knowledge of the true regression vector support. We characterize its robustness via the finite-sample breakdown point and the influence function. We carry out extensive simulations and observe that the class of $\tau$-Lasso estimators exhibits robustness and reliable performance in both contaminated and uncontaminated data settings. We also validate our theoretical findings on robustness properties through simulation experiments. In the face of outliers and high-leverage points, the adaptive $\tau$-Lasso and $\tau$-Lasso estimators achieve the best performance or close-to-best performance in terms of prediction and variable selection accuracy compared to other competing regularized estimators for all scenarios considered in this study. Therefore, the adaptive $\tau$-Lasso and $\tau$-Lasso estimators can be effectively employed for a variety of sparse linear regression problems, particularly in high-dimensional settings and when the data is contaminated by outliers and high-leverage points.
    PGA: Personalizing Grasping Agents with Single Human-Robot Interaction. (arXiv:2310.12547v1 [cs.RO])
    Language-Conditioned Robotic Grasping (LCRG) aims to develop robots that ground and grasp objects based on natural language instructions. While robots capable of recognizing personal objects like "my wallet" can interact more naturally with non-expert users, current LCRG systems primarily limit robots to understanding only generic expressions. To this end, we introduce a task scenario GraspMine with a novel dataset that aims to locate and grasp personal objects given personal indicators via learning from a single human-robot interaction. To address GraspMine, we propose Personalized Grasping Agent (PGA), that learns personal objects by propagating user-given information through a Reminiscence-a collection of raw images from the user's environment. Specifically, PGA acquires personal object information by a user presenting a personal object with its associated indicator, followed by PGA inspecting the object by rotating it. Based on the acquired information, PGA pseudo-labels objects in the Reminiscence by our proposed label propagation algorithm. Harnessing the information acquired from the interactions and the pseudo-labeled objects in the Reminiscence, PGA adapts the object grounding model to grasp personal objects. Experiments on GraspMine show that PGA significantly outperforms baseline methods both in offline and online settings, signifying its effectiveness and personalization applicability on real-world scenarios. Finally, qualitative analysis shows the effectiveness of PGA through a detailed investigation of results in each phase.
    STANLEY: Stochastic Gradient Anisotropic Langevin Dynamics for Learning Energy-Based Models. (arXiv:2310.12667v1 [stat.ML])
    We propose in this paper, STANLEY, a STochastic gradient ANisotropic LangEvin dYnamics, for sampling high dimensional data. With the growing efficacy and potential of Energy-Based modeling, also known as non-normalized probabilistic modeling, for modeling a generative process of different natures of high dimensional data observations, we present an end-to-end learning algorithm for Energy-Based models (EBM) with the purpose of improving the quality of the resulting sampled data points. While the unknown normalizing constant of EBMs makes the training procedure intractable, resorting to Markov Chain Monte Carlo (MCMC) is in general a viable option. Realizing what MCMC entails for the EBM training, we propose in this paper, a novel high dimensional sampling method, based on an anisotropic stepsize and a gradient-informed covariance matrix, embedded into a discretized Langevin diffusion. We motivate the necessity for an anisotropic update of the negative samples in the Markov Chain by the nonlinearity of the backbone of the EBM, here a Convolutional Neural Network. Our resulting method, namely STANLEY, is an optimization algorithm for training Energy-Based models via our newly introduced MCMC method. We provide a theoretical understanding of our sampling scheme by proving that the sampler leads to a geometrically uniformly ergodic Markov Chain. Several image generation experiments are provided in our paper to show the effectiveness of our method.
    One-shot Empirical Privacy Estimation for Federated Learning. (arXiv:2302.03098v4 [cs.LG] UPDATED)
    Privacy estimation techniques for differentially private (DP) algorithms are useful for comparing against analytical bounds, or to empirically measure privacy loss in settings where known analytical bounds are not tight. However, existing privacy auditing techniques usually make strong assumptions on the adversary (e.g., knowledge of intermediate model iterates or the training data distribution), are tailored to specific tasks, model architectures, or DP algorithm, and/or require retraining the model many times (typically on the order of thousands). These shortcomings make deploying such techniques at scale difficult in practice, especially in federated settings where model training can take days or weeks. In this work, we present a novel ``one-shot'' approach that can systematically address these challenges, allowing efficient auditing or estimation of the privacy loss of a model during the same, single training run used to fit model parameters, and without requiring any a priori knowledge about the model architecture, task, or DP training algorithm. We show that our method provides provably correct estimates for the privacy loss under the Gaussian mechanism, and we demonstrate its performance on well-established FL benchmark datasets under several adversarial threat models.
    Topic-Level Bayesian Surprise and Serendipity for Recommender Systems. (arXiv:2308.06368v2 [cs.IR] UPDATED)
    A recommender system that optimizes its recommendations solely to fit a user's history of ratings for consumed items can create a filter bubble, wherein the user does not get to experience items from novel, unseen categories. One approach to mitigate this undesired behavior is to recommend items with high potential for serendipity, namely surprising items that are likely to be highly rated. In this paper, we propose a content-based formulation of serendipity that is rooted in Bayesian surprise and use it to measure the serendipity of items after they are consumed and rated by the user. When coupled with a collaborative-filtering component that identifies similar users, this enables recommending items with high potential for serendipity. To facilitate the evaluation of topic-level models for surprise and serendipity, we introduce a dataset of book reading histories extracted from Goodreads, containing over 26 thousand users and close to 1.3 million books, where we manually annotate 449 books read by 4 users in terms of their time-dependent, topic-level surprise. Experimental evaluations show that models that use Bayesian surprise correlate much better with the manual annotations of topic-level surprise than distance-based heuristics, and also obtain better serendipitous item recommendation performance.
    Generative Pretrained Autoregressive Transformer Graph Neural Network applied to the Analysis and Discovery of Novel Proteins. (arXiv:2305.04934v2 [q-bio.BM] CROSS LISTED)
    We report a flexible language-model based deep learning strategy, applied here to solve complex forward and inverse problems in protein modeling, based on an attention neural network that integrates transformer and graph convolutional architectures in a causal multi-headed graph mechanism, to realize a generative pretrained model. The model is applied to predict secondary structure content (per-residue level and overall content), protein solubility, and sequencing tasks. Further trained on inverse tasks, the model is rendered capable of designing proteins with these properties as target features. The model is formulated as a general framework, completely prompt-based, and can be adapted for a variety of downstream tasks. We find that adding additional tasks yields emergent synergies that the model exploits in improving overall performance, beyond what would be possible by training a model on each dataset alone. Case studies are presented to validate the method, yielding protein designs specifically focused on structural proteins, but also exploring the applicability in the design of soluble, antimicrobial biomaterials. While our model is trained to ultimately perform 8 distinct tasks, with available datasets it can be extended to solve additional problems. In a broader sense, this work illustrates a form of multiscale modeling that relates a set of ultimate building blocks (here, byte-level utf8 characters that define the nature of the physical system at hand) to complex output. This materiomic scheme captures complex emergent relationships between universal building block and resulting properties via a synergizing learning capacity to express a set of potentialities embedded in the knowledge used in training, via the interplay of universality and diversity.
    AdANNS: A Framework for Adaptive Semantic Search. (arXiv:2305.19435v2 [cs.LG] UPDATED)
    Web-scale search systems learn an encoder to embed a given query which is then hooked into an approximate nearest neighbor search (ANNS) pipeline to retrieve similar data points. To accurately capture tail queries and data points, learned representations typically are rigid, high-dimensional vectors that are generally used as-is in the entire ANNS pipeline and can lead to computationally expensive retrieval. In this paper, we argue that instead of rigid representations, different stages of ANNS can leverage adaptive representations of varying capacities to achieve significantly better accuracy-compute trade-offs, i.e., stages of ANNS that can get away with more approximate computation should use a lower-capacity representation of the same data point. To this end, we introduce AdANNS, a novel ANNS design framework that explicitly leverages the flexibility of Matryoshka Representations. We demonstrate state-of-the-art accuracy-compute trade-offs using novel AdANNS-based key ANNS building blocks like search data structures (AdANNS-IVF) and quantization (AdANNS-OPQ). For example on ImageNet retrieval, AdANNS-IVF is up to 1.5% more accurate than the rigid representations-based IVF at the same compute budget; and matches accuracy while being up to 90x faster in wall-clock time. For Natural Questions, 32-byte AdANNS-OPQ matches the accuracy of the 64-byte OPQ baseline constructed using rigid representations -- same accuracy at half the cost! We further show that the gains from AdANNS translate to modern-day composite ANNS indices that combine search structures and quantization. Finally, we demonstrate that AdANNS can enable inference-time adaptivity for compute-aware search on ANNS indices built non-adaptively on matryoshka representations. Code is open-sourced at https://github.com/RAIVNLab/AdANNS.
    A path-norm toolkit for modern networks: consequences, promises and challenges. (arXiv:2310.01225v2 [stat.ML] UPDATED)
    This work introduces the first toolkit around path-norms that is fully able to encompass general DAG ReLU networks with biases, skip connections and any operation based on the extraction of order statistics: max pooling, GroupSort etc. This toolkit notably allows us to establish generalization bounds for modern neural networks that are not only the most widely applicable path-norm based ones, but also recover or beat the sharpest known bounds of this type. These extended path-norms further enjoy the usual benefits of path-norms: ease of computation, invariance under the symmetries of the network, and improved sharpness on feedforward networks compared to the product of operators' norms, another complexity measure most commonly used. The versatility of the toolkit and its ease of implementation allow us to challenge the concrete promises of path-norm-based generalization bounds, by numerically evaluating the sharpest known bounds for ResNets on ImageNet.
    Understanding Multi-phase Optimization Dynamics and Rich Nonlinear Behaviors of ReLU Networks. (arXiv:2305.12467v3 [cs.LG] UPDATED)
    The training process of ReLU neural networks often exhibits complicated nonlinear phenomena. The nonlinearity of models and non-convexity of loss pose significant challenges for theoretical analysis. Therefore, most previous theoretical works on the optimization dynamics of neural networks focus either on local analysis (like the end of training) or approximate linear models (like Neural Tangent Kernel). In this work, we conduct a complete theoretical characterization of the training process of a two-layer ReLU network trained by Gradient Flow on a linearly separable data. In this specific setting, our analysis captures the whole optimization process starting from random initialization to final convergence. Despite the relatively simple model and data that we studied, we reveal four different phases from the whole training process showing a general simplifying-to-complicating learning trend. Specific nonlinear behaviors can also be precisely identified and captured theoretically, such as initial condensation, saddle-to-plateau dynamics, plateau escape, changes of activation patterns, learning with increasing complexity, etc.
    An Introduction to Transformers. (arXiv:2304.10557v4 [cs.LG] UPDATED)
    The transformer is a neural network component that can be used to learn useful representations of sequences or sets of data-points. The transformer has driven recent advances in natural language processing, computer vision, and spatio-temporal modelling. There are many introductions to transformers, but most do not contain precise mathematical descriptions of the architecture and the intuitions behind the design choices are often also missing. Moreover, as research takes a winding path, the explanations for the components of the transformer can be idiosyncratic. In this note we aim for a mathematically precise, intuitive, and clean description of the transformer architecture. We will not discuss training as this is rather standard. We assume that the reader is familiar with fundamental topics in machine learning including multi-layer perceptrons, linear transformations, softmax functions and basic probability.
    Relational Self-Supervised Learning. (arXiv:2203.08717v2 [cs.CV] UPDATED)
    Self-supervised Learning (SSL) including the mainstream contrastive learning has achieved great success in learning visual representations without data annotations. However, most methods mainly focus on the instance level information (\ie, the different augmented images of the same instance should have the same feature or cluster into the same class), but there is a lack of attention on the relationships between different instances. In this paper, we introduce a novel SSL paradigm, which we term as relational self-supervised learning (ReSSL) framework that learns representations by modeling the relationship between different instances. Specifically, our proposed method employs sharpened distribution of pairwise similarities among different instances as \textit{relation} metric, which is thus utilized to match the feature embeddings of different augmentations. To boost the performance, we argue that weak augmentations matter to represent a more reliable relation, and leverage momentum strategy for practical efficiency. The designed asymmetric predictor head and an InfoNCE warm-up strategy enhance the robustness to hyper-parameters and benefit the resulting performance. Experimental results show that our proposed ReSSL substantially outperforms the state-of-the-art methods across different network architectures, including various lightweight networks (\eg, EfficientNet and MobileNet).
    EDGI: Equivariant Diffusion for Planning with Embodied Agents. (arXiv:2303.12410v2 [cs.LG] UPDATED)
    Embodied agents operate in a structured world, often solving tasks with spatial, temporal, and permutation symmetries. Most algorithms for planning and model-based reinforcement learning (MBRL) do not take this rich geometric structure into account, leading to sample inefficiency and poor generalization. We introduce the Equivariant Diffuser for Generating Interactions (EDGI), an algorithm for MBRL and planning that is equivariant with respect to the product of the spatial symmetry group SE(3), the discrete-time translation group Z, and the object permutation group Sn. EDGI follows the Diffuser framework (Janner et al., 2022) in treating both learning a world model and planning in it as a conditional generative modeling problem, training a diffusion model on an offline trajectory dataset. We introduce a new SE(3)xZxSn-equivariant diffusion model that supports multiple representations. We integrate this model in a planning loop, where conditioning and classifier guidance let us softly break the symmetry for specific tasks as needed. On object manipulation and navigation tasks, EDGI is substantially more sample efficient and generalizes better across the symmetry group than non-equivariant models.
    Data Augmentation for Time-Series Classification: An Extensive Empirical Study and Comprehensive Survey. (arXiv:2310.10060v2 [cs.LG] UPDATED)
    Data Augmentation (DA) has emerged as an indispensable strategy in Time Series Classification (TSC), primarily due to its capacity to amplify training samples, thereby bolstering model robustness, diversifying datasets, and curtailing overfitting. However, the current landscape of DA in TSC is plagued with fragmented literature reviews, nebulous methodological taxonomies, inadequate evaluative measures, and a dearth of accessible, user-oriented tools. In light of these challenges, this study embarks on an exhaustive dissection of DA methodologies within the TSC realm. Our initial approach involved an extensive literature review spanning a decade, revealing that contemporary surveys scarcely capture the breadth of advancements in DA for TSC, prompting us to meticulously analyze over 100 scholarly articles to distill more than 60 unique DA techniques. This rigorous analysis precipitated the formulation of a novel taxonomy, purpose-built for the intricacies of DA in TSC, categorizing techniques into five principal echelons: Transformation-Based, Pattern-Based, Generative, Decomposition-Based, and Automated Data Augmentation. Our taxonomy promises to serve as a robust navigational aid for scholars, offering clarity and direction in method selection. Addressing the conspicuous absence of holistic evaluations for prevalent DA techniques, we executed an all-encompassing empirical assessment, wherein upwards of 15 DA strategies were subjected to scrutiny across 8 UCR time-series datasets, employing ResNet and a multi-faceted evaluation paradigm encompassing Accuracy, Method Ranking, and Residual Analysis, yielding a benchmark accuracy of 88.94 +- 11.83%. Our investigation underscored the inconsistent efficacies of DA techniques, with...
    Can Brain Signals Reveal Inner Alignment with Human Languages?. (arXiv:2208.06348v4 [q-bio.NC] UPDATED)
    Brain Signals, such as Electroencephalography (EEG), and human languages have been widely explored independently for many downstream tasks, however, the connection between them has not been well explored. In this study, we explore the relationship and dependency between EEG and language. To study at the representation level, we introduced \textbf{MTAM}, a \textbf{M}ultimodal \textbf{T}ransformer \textbf{A}lignment \textbf{M}odel, to observe coordinated representations between the two modalities. We used various relationship alignment-seeking techniques, such as Canonical Correlation Analysis and Wasserstein Distance, as loss functions to transfigure features. On downstream applications, sentiment analysis and relation detection, we achieved new state-of-the-art results on two datasets, ZuCo and K-EmoCon. Our method achieved an F1-score improvement of 1.7% on K-EmoCon and 9.3% on Zuco datasets for sentiment analysis, and 7.4% on ZuCo for relation detection. In addition, we provide interpretations of the performance improvement: (1) feature distribution shows the effectiveness of the alignment module for discovering and encoding the relationship between EEG and language; (2) alignment weights show the influence of different language semantics as well as EEG frequency features; (3) brain topographical maps provide an intuitive demonstration of the connectivity in the brain regions. Our code is available at \url{https://github.com/Jason-Qiu/EEG_Language_Alignment}.
    DCSI -- An improved measure of cluster separability based on separation and connectedness. (arXiv:2310.12806v1 [stat.ML])
    Whether class labels in a given data set correspond to meaningful clusters is crucial for the evaluation of clustering algorithms using real-world data sets. This property can be quantified by separability measures. A review of the existing literature shows that neither classification-based complexity measures nor cluster validity indices (CVIs) adequately incorporate the central aspects of separability for density-based clustering: between-class separation and within-class connectedness. A newly developed measure (density cluster separability index, DCSI) aims to quantify these two characteristics and can also be used as a CVI. Extensive experiments on synthetic data indicate that DCSI correlates strongly with the performance of DBSCAN measured via the adjusted rand index (ARI) but lacks robustness when it comes to multi-class data sets with overlapping classes that are ill-suited for density-based hard clustering. Detailed evaluation on frequently used real-world data sets shows that DCSI can correctly identify touching or overlapping classes that do not form meaningful clusters.
    Language-Guided Traffic Simulation via Scene-Level Diffusion. (arXiv:2306.06344v2 [cs.RO] UPDATED)
    Realistic and controllable traffic simulation is a core capability that is necessary to accelerate autonomous vehicle (AV) development. However, current approaches for controlling learning-based traffic models require significant domain expertise and are difficult for practitioners to use. To remedy this, we present CTG++, a scene-level conditional diffusion model that can be guided by language instructions. Developing this requires tackling two challenges: the need for a realistic and controllable traffic model backbone, and an effective method to interface with a traffic model using language. To address these challenges, we first propose a scene-level diffusion model equipped with a spatio-temporal transformer backbone, which generates realistic and controllable traffic. We then harness a large language model (LLM) to convert a user's query into a loss function, guiding the diffusion model towards query-compliant generation. Through comprehensive evaluation, we demonstrate the effectiveness of our proposed method in generating realistic, query-compliant traffic simulations.
    Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale. (arXiv:2306.15687v2 [eess.AS] UPDATED)
    Large-scale generative models such as GPT and DALL-E have revolutionized the research community. These models not only generate high fidelity outputs, but are also generalists which can solve tasks not explicitly taught. In contrast, speech generative models are still primitive in terms of scale and task generalization. In this paper, we present Voicebox, the most versatile text-guided generative model for speech at scale. Voicebox is a non-autoregressive flow-matching model trained to infill speech, given audio context and text, trained on over 50K hours of speech that are not filtered or enhanced. Similar to GPT, Voicebox can perform many different tasks through in-context learning, but is more flexible as it can also condition on future context. Voicebox can be used for mono or cross-lingual zero-shot text-to-speech synthesis, noise removal, content editing, style conversion, and diverse sample generation. In particular, Voicebox outperforms the state-of-the-art zero-shot TTS model VALL-E on both intelligibility (5.9% vs 1.9% word error rates) and audio similarity (0.580 vs 0.681) while being up to 20 times faster. Audio samples can be found in \url{https://voicebox.metademolab.com}.
    ROMO: Retrieval-enhanced Offline Model-based Optimization. (arXiv:2310.07560v2 [cs.LG] UPDATED)
    Data-driven black-box model-based optimization (MBO) problems arise in a great number of practical application scenarios, where the goal is to find a design over the whole space maximizing a black-box target function based on a static offline dataset. In this work, we consider a more general but challenging MBO setting, named constrained MBO (CoMBO), where only part of the design space can be optimized while the rest is constrained by the environment. A new challenge arising from CoMBO is that most observed designs that satisfy the constraints are mediocre in evaluation. Therefore, we focus on optimizing these mediocre designs in the offline dataset while maintaining the given constraints rather than further boosting the best observed design in the traditional MBO setting. We propose retrieval-enhanced offline model-based optimization (ROMO), a new derivable forward approach that retrieves the offline dataset and aggregates relevant samples to provide a trusted prediction, and use it for gradient-based optimization. ROMO is simple to implement and outperforms state-of-the-art approaches in the CoMBO setting. Empirically, we conduct experiments on a synthetic Hartmann (3D) function dataset, an industrial CIO dataset, and a suite of modified tasks in the Design-Bench benchmark. Results show that ROMO performs well in a wide range of constrained optimization tasks.
    INSTA-BNN: Binary Neural Network with INSTAnce-aware Threshold. (arXiv:2204.07439v3 [cs.CV] UPDATED)
    Binary Neural Networks (BNNs) have emerged as a promising solution for reducing the memory footprint and compute costs of deep neural networks, but they suffer from quality degradation due to the lack of freedom as activations and weights are constrained to the binary values. To compensate for the accuracy drop, we propose a novel BNN design called Binary Neural Network with INSTAnce-aware threshold (INSTA-BNN), which controls the quantization threshold dynamically in an input-dependent or instance-aware manner. According to our observation, higher-order statistics can be a representative metric to estimate the characteristics of the input distribution. INSTA-BNN is designed to adjust the threshold dynamically considering various information, including higher-order statistics, but it is also optimized judiciously to realize minimal overhead on a real device. Our extensive study shows that INSTA-BNN outperforms the baseline by 3.0% and 2.8% on the ImageNet classification task with comparable computing cost, achieving 68.5% and 72.2% top-1 accuracy on ResNet-18 and MobileNetV1 based models, respectively.
    Discretize Relaxed Solution of Spectral Clustering via a Non-Heuristic Algorithm. (arXiv:2310.12752v1 [cs.LG])
    Spectral clustering and its extensions usually consist of two steps: (1) constructing a graph and computing the relaxed solution; (2) discretizing relaxed solutions. Although the former has been extensively investigated, the discretization techniques are mainly heuristic methods, e.g., k-means, spectral rotation. Unfortunately, the goal of the existing methods is not to find a discrete solution that minimizes the original objective. In other words, the primary drawback is the neglect of the original objective when computing the discrete solution. Inspired by the first-order optimization algorithms, we propose to develop a first-order term to bridge the original problem and discretization algorithm, which is the first non-heuristic to the best of our knowledge. Since the non-heuristic method is aware of the original graph cut problem, the final discrete solution is more reliable and achieves the preferable loss value. We also theoretically show that the continuous optimum is beneficial to discretization algorithms though simply finding its closest discrete solution is an existing heuristic algorithm which is also unreliable. Sufficient experiments significantly show the superiority of our method.
    Fairness in Streaming Submodular Maximization over a Matroid Constraint. (arXiv:2305.15118v2 [cs.LG] UPDATED)
    Streaming submodular maximization is a natural model for the task of selecting a representative subset from a large-scale dataset. If datapoints have sensitive attributes such as gender or race, it becomes important to enforce fairness to avoid bias and discrimination. This has spurred significant interest in developing fair machine learning algorithms. Recently, such algorithms have been developed for monotone submodular maximization under a cardinality constraint. In this paper, we study the natural generalization of this problem to a matroid constraint. We give streaming algorithms as well as impossibility results that provide trade-offs between efficiency, quality and fairness. We validate our findings empirically on a range of well-known real-world applications: exemplar-based clustering, movie recommendation, and maximum coverage in social networks.
    PEFT-Ref: A Modular Reference Architecture and Typology for Parameter-Efficient Finetuning Techniques. (arXiv:2304.12410v2 [cs.CL] UPDATED)
    Recent parameter-efficient finetuning (PEFT) techniques aim to improve over the considerable cost of fully finetuning large pretrained language models (PLM). As different PEFT techniques proliferate, it is becoming difficult to compare them, in particular in terms of (i) the structure and functionality they add to the PLM, (ii) the different types and degrees of efficiency improvements achieved, (iii) performance at different downstream tasks, and (iv) how differences in structure and functionality relate to efficiency and task performance. To facilitate such comparisons, this paper presents a reference architecture which standardises aspects shared by different PEFT techniques, while isolating differences to specific locations and interactions with the standard components. Through this process of standardising and isolating differences, a modular view of PEFT techniques emerges, supporting not only direct comparison of different techniques and their efficiency and task performance, but also systematic exploration of reusability and composability of the different types of finetuned modules. We demonstrate how the reference architecture can be applied to understand properties and relative advantages of PEFT techniques, hence to inform selection of techniques for specific tasks, and design choices for new PEFT techniques.
    Hybrid Search for Efficient Planning with Completeness Guarantees. (arXiv:2310.12819v1 [cs.AI])
    Solving complex planning problems has been a long-standing challenge in computer science. Learning-based subgoal search methods have shown promise in tackling these problems, but they often suffer from a lack of completeness guarantees, meaning that they may fail to find a solution even if one exists. In this paper, we propose an efficient approach to augment a subgoal search method to achieve completeness in discrete action spaces. Specifically, we augment the high-level search with low-level actions to execute a multi-level (hybrid) search, which we call complete subgoal search. This solution achieves the best of both worlds: the practical efficiency of high-level search and the completeness of low-level search. We apply the proposed search method to a recently proposed subgoal search algorithm and evaluate the algorithm trained on offline data on complex planning problems. We demonstrate that our complete subgoal search not only guarantees completeness but can even improve performance in terms of search expansions for instances that the high-level could solve without low-level augmentations. Our approach makes it possible to apply subgoal-level planning for systems where completeness is a critical requirement.
    When Rigidity Hurts: Soft Consistency Regularization for Probabilistic Hierarchical Time Series Forecasting. (arXiv:2206.07940v4 [cs.LG] UPDATED)
    Probabilistic hierarchical time-series forecasting is an important variant of time-series forecasting, where the goal is to model and forecast multivariate time-series that have underlying hierarchical relations. Most methods focus on point predictions and do not provide well-calibrated probabilistic forecasts distributions. Recent state-of-art probabilistic forecasting methods also impose hierarchical relations on point predictions and samples of distribution which does not account for coherency of forecast distributions. Previous works also silently assume that datasets are always consistent with given hierarchical relations and do not adapt to real-world datasets that show deviation from this assumption. We close both these gap and propose PROFHiT, which is a fully probabilistic hierarchical forecasting model that jointly models forecast distribution of entire hierarchy. PROFHiT uses a flexible probabilistic Bayesian approach and introduces a novel Distributional Coherency regularization to learn from hierarchical relations for entire forecast distribution that enables robust and calibrated forecasts as well as adapt to datasets of varying hierarchical consistency. On evaluating PROFHiT over wide range of datasets, we observed 41-88% better performance in accuracy and significantly better calibration. Due to modeling the coherency over full distribution, we observed that PROFHiT can robustly provide reliable forecasts even if up to 10% of input time-series data is missing where other methods' performance severely degrade by over 70%.
    Survival of the Most Influential Prompts: Efficient Black-Box Prompt Search via Clustering and Pruning. (arXiv:2310.12774v1 [cs.CL])
    Prompt-based learning has been an effective paradigm for large pretrained language models (LLM), enabling few-shot or even zero-shot learning. Black-box prompt search has received growing interest recently for its distinctive properties of gradient-free optimization, proven particularly useful and powerful for model-as-a-service usage. However, the discrete nature and the complexity of combinatorial optimization hinder the efficiency of modern black-box approaches. Despite extensive research on search algorithms, the crucial aspect of search space design and optimization has been largely overlooked. In this paper, we first conduct a sensitivity analysis by prompting LLM, revealing that only a small number of tokens exert a disproportionate amount of influence on LLM predictions. Leveraging this insight, we propose the Clustering and Pruning for Efficient Black-box Prompt Search (ClaPS), a simple black-box search method that first clusters and prunes the search space to focus exclusively on influential prompt tokens. By employing even simple search methods within the pruned search space, ClaPS achieves state-of-the-art performance across various tasks and LLMs, surpassing the performance of complex approaches while significantly reducing search costs. Our findings underscore the critical role of search space design and optimization in enhancing both the usefulness and the efficiency of black-box prompt-based learning.
    Tracking electricity losses and their perceived causes using nighttime light and social media. (arXiv:2310.12346v1 [physics.soc-ph])
    Urban environments are intricate systems where the breakdown of critical infrastructure can impact both the economic and social well-being of communities. Electricity systems hold particular significance, as they are essential for other infrastructure, and disruptions can trigger widespread consequences. Typically, assessing electricity availability requires ground-level data, a challenge in conflict zones and regions with limited access. This study shows how satellite imagery, social media, and information extraction can monitor blackouts and their perceived causes. Night-time light data (in March 2019 for Caracas, Venezuela) is used to indicate blackout regions. Twitter data is used to determine sentiment and topic trends, while statistical analysis and topic modeling delved into public perceptions regarding blackout causes. The findings show an inverse relationship between nighttime light intensity. Tweets mentioning the Venezuelan President displayed heightened negativity and a greater prevalence of blame-related terms, suggesting a perception of government accountability for the outages.
    Causal-structure Driven Augmentations for Text OOD Generalization. (arXiv:2310.12803v1 [cs.LG])
    The reliance of text classifiers on spurious correlations can lead to poor generalization at deployment, raising concerns about their use in safety-critical domains such as healthcare. In this work, we propose to use counterfactual data augmentation, guided by knowledge of the causal structure of the data, to simulate interventions on spurious features and to learn more robust text classifiers. We show that this strategy is appropriate in prediction problems where the label is spuriously correlated with an attribute. Under the assumptions of such problems, we discuss the favorable sample complexity of counterfactual data augmentation, compared to importance re-weighting. Pragmatically, we match examples using auxiliary data, based on diff-in-diff methodology, and use a large language model (LLM) to represent a conditional probability of text. Through extensive experimentation on learning caregiver-invariant predictors of clinical diagnoses from medical narratives and on semi-synthetic data, we demonstrate that our method for simulating interventions improves out-of-distribution (OOD) accuracy compared to baseline invariant learning algorithms.
    Audio Editing with Non-Rigid Text Prompts. (arXiv:2310.12858v1 [cs.SD])
    In this paper, we explore audio-editing with non-rigid text edits. We show that the proposed editing pipeline is able to create audio edits that remain faithful to the input audio. We explore text prompts that perform addition, style transfer, and in-painting. We quantitatively and qualitatively show that the edits are able to obtain results which outperform Audio-LDM, a recently released text-prompted audio generation model. Qualitative inspection of the results points out that the edits given by our approach remain more faithful to the input audio in terms of keeping the original onsets and offsets of the audio events.
    Neural Relation Graph: A Unified Framework for Identifying Label Noise and Outlier Data. (arXiv:2301.12321v4 [cs.LG] UPDATED)
    Diagnosing and cleaning data is a crucial step for building robust machine learning systems. However, identifying problems within large-scale datasets with real-world distributions is challenging due to the presence of complex issues such as label errors, under-representation, and outliers. In this paper, we propose a unified approach for identifying the problematic data by utilizing a largely ignored source of information: a relational structure of data in the feature-embedded space. To this end, we present scalable and effective algorithms for detecting label errors and outlier data based on the relational graph structure of data. We further introduce a visualization tool that provides contextual information of a data point in the feature-embedded space, serving as an effective tool for interactively diagnosing data. We evaluate the label error and outlier/out-of-distribution (OOD) detection performances of our approach on the large-scale image, speech, and language domain tasks, including ImageNet, ESC-50, and SST2. Our approach achieves state-of-the-art detection performance on all tasks considered and demonstrates its effectiveness in debugging large-scale real-world datasets across various domains. We release codes at https://github.com/snu-mllab/Neural-Relation-Graph.
    Explanation-Based Training with Differentiable Insertion/Deletion Metric-Aware Regularizers. (arXiv:2310.12553v1 [cs.LG])
    The quality of explanations for the predictions of complex machine learning predictors is often measured using insertion and deletion metrics, which assess the faithfulness of the explanations, i.e., how correctly the explanations reflect the predictor's behavior. To improve the faithfulness, we propose insertion/deletion metric-aware explanation-based optimization (ID-ExpO), which optimizes differentiable predictors to improve both insertion and deletion scores of the explanations while keeping their predictive accuracy. Since the original insertion and deletion metrics are indifferentiable with respect to the explanations and directly unavailable for gradient-based optimization, we extend the metrics to be differentiable and use them to formalize insertion and deletion metric-based regularizers. The experimental results on image and tabular datasets show that the deep neural networks-based predictors fine-tuned using ID-ExpO enable popular post-hoc explainers to produce more faithful and easy-to-interpret explanations while keeping high predictive accuracy.
    Example-based Hypernetworks for Out-of-Distribution Generalization. (arXiv:2203.14276v3 [cs.CL] UPDATED)
    As Natural Language Processing (NLP) algorithms continually achieve new milestones, out-of-distribution generalization remains a significant challenge. This paper addresses the issue of multi-source adaptation for unfamiliar domains: We leverage labeled data from multiple source domains to generalize to unknown target domains at training. Our innovative framework employs example-based Hypernetwork adaptation: a T5 encoder-decoder initially generates a unique signature from an input example, embedding it within the source domains' semantic space. This signature is subsequently utilized by a Hypernetwork to generate the task classifier's weights. We evaluated our method across two tasks - sentiment classification and natural language inference - in 29 adaptation scenarios, where it outpaced established algorithms. In an advanced version, the signature also enriches the input example's representation. We also compare our finetuned architecture to few-shot GPT-3, demonstrating its effectiveness in essential use cases. To our knowledge, this marks the first application of Hypernetworks to the adaptation for unknown domains.
    KwaiYiiMath: Technical Report. (arXiv:2310.07488v2 [cs.CL] UPDATED)
    Recent advancements in large language models (LLMs) have demonstrated remarkable abilities in handling a variety of natural language processing (NLP) downstream tasks, even on mathematical tasks requiring multi-step reasoning. In this report, we introduce the KwaiYiiMath which enhances the mathematical reasoning abilities of KwaiYiiBase1, by applying Supervised Fine-Tuning (SFT) and Reinforced Learning from Human Feedback (RLHF), including on both English and Chinese mathematical tasks. Meanwhile, we also constructed a small-scale Chinese primary school mathematics test set (named KMath), consisting of 188 examples to evaluate the correctness of the problem-solving process generated by the models. Empirical studies demonstrate that KwaiYiiMath can achieve state-of-the-art (SOTA) performance on GSM8k, CMath, and KMath compared with the similar size models, respectively.
    Convergence of policy gradient methods for finite-horizon stochastic linear-quadratic control problems. (arXiv:2211.00617v2 [math.OC] UPDATED)
    We study the global linear convergence of policy gradient (PG) methods for finite-horizon continuous-time exploratory linear-quadratic control (LQC) problems. The setting includes stochastic LQC problems with indefinite costs and allows additional entropy regularisers in the objective. We consider a continuous-time Gaussian policy whose mean is linear in the state variable and whose covariance is state-independent. Contrary to discrete-time problems, the cost is noncoercive in the policy and not all descent directions lead to bounded iterates. We propose geometry-aware gradient descents for the mean and covariance of the policy using the Fisher geometry and the Bures-Wasserstein geometry, respectively. The policy iterates are shown to satisfy an a-priori bound, and converge globally to the optimal policy with a linear rate. We further propose a novel PG method with discrete-time policies. The algorithm leverages the continuous-time analysis, and achieves a robust linear convergence across different action frequencies. A numerical experiment confirms the convergence and robustness of the proposed algorithm.
    IC3: Image Captioning by Committee Consensus. (arXiv:2302.01328v3 [cs.CV] UPDATED)
    If you ask a human to describe an image, they might do so in a thousand different ways. Traditionally, image captioning models are trained to generate a single "best" (most like a reference) image caption. Unfortunately, doing so encourages captions that are "informationally impoverished," and focus on only a subset of the possible details, while ignoring other potentially useful information in the scene. In this work, we introduce a simple, yet novel, method: "Image Captioning by Committee Consensus" (IC3), designed to generate a single caption that captures high-level details from several annotator viewpoints. Humans rate captions produced by IC3 at least as helpful as baseline SOTA models more than two thirds of the time, and IC3 can improve the performance of SOTA automated recall systems by up to 84%, outperforming single human-generated reference captions, and indicating significant improvements over SOTA approaches for visual description. Code is available at https://davidmchan.github.io/caption-by-committee/
    An Information-Theoretic Analysis of Compute-Optimal Neural Scaling Laws. (arXiv:2212.01365v2 [cs.LG] UPDATED)
    We study the compute-optimal trade-off between model and training data set sizes for large neural networks. Our result suggests a linear relation similar to that supported by the empirical analysis of chinchilla. While that work studies transformer-based large language models trained on the MassiveText corpus gopher, as a starting point for development of a mathematical theory, we focus on a simpler learning model and data generating process, each based on a neural network with a sigmoidal output unit and single hidden layer of ReLU activation units. We introduce general error upper bounds for a class of algorithms which incrementally update a statistic (for example gradient descent). For a particular learning model inspired by barron 1993, we establish an upper bound on the minimal information-theoretically achievable expected error as a function of model and data set sizes. We then derive allocations of computation that minimize this bound. We present empirical results which suggest that this approximation correctly identifies an asymptotic linear compute-optimal scaling. This approximation also generates new insights. Among other things, it suggests that, as the input dimension or latent space complexity grows, as might be the case for example if a longer history of tokens is taken as input to a language model, a larger fraction of the compute budget should be allocated to growing the learning model rather than training data.
    Speaking Style Conversion in the Waveform Domain Using Discrete Self-Supervised Units. (arXiv:2212.09730v2 [cs.SD] UPDATED)
    We introduce DISSC, a novel, lightweight method that converts the rhythm, pitch contour and timbre of a recording to a target speaker in a textless manner. Unlike DISSC, most voice conversion (VC) methods focus primarily on timbre, and ignore people's unique speaking style (prosody). The proposed approach uses a pretrained, self-supervised model for encoding speech to discrete units, which makes it simple, effective, and fast to train. All conversion modules are only trained on reconstruction like tasks, thus suitable for any-to-many VC with no paired data. We introduce a suite of quantitative and qualitative evaluation metrics for this setup, and empirically demonstrate that DISSC significantly outperforms the evaluated baselines. Code and samples are available at https://pages.cs.huji.ac.il/adiyoss-lab/dissc/.
    Named Entity Recognition for Monitoring Plant Health Threats in Tweets: a ChouBERT Approach. (arXiv:2310.12522v1 [cs.CL])
    An important application scenario of precision agriculture is detecting and measuring crop health threats using sensors and data analysis techniques. However, the textual data are still under-explored among the existing solutions due to the lack of labelled data and fine-grained semantic resources. Recent research suggests that the increasing connectivity of farmers and the emergence of online farming communities make social media like Twitter a participatory platform for detecting unfamiliar plant health events if we can extract essential information from unstructured textual data. ChouBERT is a French pre-trained language model that can identify Tweets concerning observations of plant health issues with generalizability on unseen natural hazards. This paper tackles the lack of labelled data by further studying ChouBERT's know-how on token-level annotation tasks over small labeled sets.
    Reinforcement Learning and Bandits for Speech and Language Processing: Tutorial, Review and Outlook. (arXiv:2210.13623v3 [cs.AI] UPDATED)
    In recent years, reinforcement learning and bandits have transformed a wide range of real-world applications including healthcare, finance, recommendation systems, robotics, and last but not least, the speech and natural language processing. While most speech and language applications of reinforcement learning algorithms are centered around improving the training of deep neural networks with its flexible optimization properties, there are still many grounds to explore to utilize the benefits of reinforcement learning, such as its reward-driven adaptability, state representations, temporal structures and generalizability. In this survey, we present an overview of recent advancements of reinforcement learning and bandits, and discuss how they can be effectively employed to solve speech and natural language processing problems with models that are adaptive, interactive and scalable.
    Deep Discriminative to Kernel Density Networks for Calibrated Inference. (arXiv:2201.13001v6 [cs.LG] UPDATED)
    Deep discriminative approaches like random forests and deep neural networks have recently found applications in many important real-world scenarios. However, deploying these learning algorithms in safety-critical applications raises concerns, particularly when it comes to ensuring confidence calibration for both in-distribution and out-of-distribution data points. Many popular methods for in-distribution (ID) calibration, such as isotonic regression and Platt's sigmoidal regression, exhibit excellent ID calibration performance but often at the cost of classification accuracy. Moreover, these methods are not calibrated for the entire feature space, leading to overconfidence in the case of out-of-distribution (OOD) samples. In this paper, we leveraged the fact that deep models, including both random forests and deep-nets, learn internal representations which are unions of polytopes with affine activation functions to conceptualize them both as partitioning rules of the feature space. We replace the affine function in each polytope populated by the training data with a Gaussian kernel. We propose sufficient conditions for our proposed methods to be consistent estimators of the corresponding class conditional densities. Moreover, our experiments on both tabular and vision benchmarks show that the proposed approaches obtain well-calibrated posteriors while mostly preserving or improving the classification accuracy of the original algorithm for in-distribution region, and extrapolates beyond the training data to handle out-of-distribution inputs appropriately.
    Multi-label Node Classification On Graph-Structured Data. (arXiv:2304.10398v3 [cs.LG] UPDATED)
    Graph Neural Networks (GNNs) have shown state-of-the-art improvements in node classification tasks on graphs. While these improvements have been largely demonstrated in a multi-class classification scenario, a more general and realistic scenario in which each node could have multiple labels has so far received little attention. The first challenge in conducting focused studies on multi-label node classification is the limited number of publicly available multi-label graph datasets. Therefore, as our first contribution, we collect and release three real-world biological datasets and develop a multi-label graph generator to generate datasets with tunable properties. While high label similarity (high homophily) is usually attributed to the success of GNNs, we argue that a multi-label scenario does not follow the usual semantics of homophily and heterophily so far defined for a multi-class scenario. As our second contribution, we define homophily and Cross-Class Neighborhood Similarity for the multi-label scenario and provide a thorough analyses of the collected $9$ multi-label datasets. Finally, we perform a large-scale comparative study with $8$ methods and $9$ datasets and analyse the performances of the methods to assess the progress made by current state of the art in the multi-label node classification scenario. We release our benchmark at https://github.com/Tianqi-py/MLGNC.
    A Quasi-Wasserstein Loss for Learning Graph Neural Networks. (arXiv:2310.11762v2 [cs.LG] UPDATED)
    When learning graph neural networks (GNNs) in node-level prediction tasks, most existing loss functions are applied for each node independently, even if node embeddings and their labels are non-i.i.d. because of their graph structures. To eliminate such inconsistency, in this study we propose a novel Quasi-Wasserstein (QW) loss with the help of the optimal transport defined on graphs, leading to new learning and prediction paradigms of GNNs. In particular, we design a "Quasi-Wasserstein" distance between the observed multi-dimensional node labels and their estimations, optimizing the label transport defined on graph edges. The estimations are parameterized by a GNN in which the optimal label transport may determine the graph edge weights optionally. By reformulating the strict constraint of the label transport to a Bregman divergence-based regularizer, we obtain the proposed Quasi-Wasserstein loss associated with two efficient solvers learning the GNN together with optimal label transport. When predicting node labels, our model combines the output of the GNN with the residual component provided by the optimal label transport, leading to a new transductive prediction paradigm. Experiments show that the proposed QW loss applies to various GNNs and helps to improve their performance in node-level classification and regression tasks.
    Blind quantum machine learning with quantum bipartite correlator. (arXiv:2310.12893v1 [quant-ph])
    Distributed quantum computing is a promising computational paradigm for performing computations that are beyond the reach of individual quantum devices. Privacy in distributed quantum computing is critical for maintaining confidentiality and protecting the data in the presence of untrusted computing nodes. In this work, we introduce novel blind quantum machine learning protocols based on the quantum bipartite correlator algorithm. Our protocols have reduced communication overhead while preserving the privacy of data from untrusted parties. We introduce robust algorithm-specific privacy-preserving mechanisms with low computational overhead that do not require complex cryptographic techniques. We then validate the effectiveness of the proposed protocols through complexity and privacy analysis. Our findings pave the way for advancements in distributed quantum computing, opening up new possibilities for privacy-aware machine learning applications in the era of quantum technologies.
    Prompt Injection Attacks and Defenses in LLM-Integrated Applications. (arXiv:2310.12815v1 [cs.CR])
    Large Language Models (LLMs) are increasingly deployed as the backend for a variety of real-world applications called LLM-Integrated Applications. Multiple recent works showed that LLM-Integrated Applications are vulnerable to prompt injection attacks, in which an attacker injects malicious instruction/data into the input of those applications such that they produce results as the attacker desires. However, existing works are limited to case studies. As a result, the literature lacks a systematic understanding of prompt injection attacks and their defenses. We aim to bridge the gap in this work. In particular, we propose a general framework to formalize prompt injection attacks. Existing attacks, which are discussed in research papers and blog posts, are special cases in our framework. Our framework enables us to design a new attack by combining existing attacks. Moreover, we also propose a framework to systematize defenses against prompt injection attacks. Using our frameworks, we conduct a systematic evaluation on prompt injection attacks and their defenses with 10 LLMs and 7 tasks. We hope our frameworks can inspire future research in this field. Our code is available at https://github.com/liu00222/Open-Prompt-Injection.
    Fine-Tuning Generative Models as an Inference Method for Robotic Tasks. (arXiv:2310.12862v1 [cs.LG])
    Adaptable models could greatly benefit robotic agents operating in the real world, allowing them to deal with novel and varying conditions. While approaches such as Bayesian inference are well-studied frameworks for adapting models to evidence, we build on recent advances in deep generative models which have greatly affected many areas of robotics. Harnessing modern GPU acceleration, we investigate how to quickly adapt the sample generation of neural network models to observations in robotic tasks. We propose a simple and general method that is applicable to various deep generative models and robotic environments. The key idea is to quickly fine-tune the model by fitting it to generated samples matching the observed evidence, using the cross-entropy method. We show that our method can be applied to both autoregressive models and variational autoencoders, and demonstrate its usability in object shape inference from grasping, inverse kinematics calculation, and point cloud completion.
    Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data. (arXiv:1811.11479v2 [cs.LG] UPDATED)
    On-device machine learning (ML) enables the training process to exploit a massive amount of user-generated private data samples. To enjoy this benefit, inter-device communication overhead should be minimized. With this end, we propose federated distillation (FD), a distributed model training algorithm whose communication payload size is much smaller than a benchmark scheme, federated learning (FL), particularly when the model size is large. Moreover, user-generated data samples are likely to become non-IID across devices, which commonly degrades the performance compared to the case with an IID dataset. To cope with this, we propose federated augmentation (FAug), where each device collectively trains a generative model, and thereby augments its local data towards yielding an IID dataset. Empirical studies demonstrate that FD with FAug yields around 26x less communication overhead while achieving 95-98% test accuracy compared to FL.
    Model-agnostic variable importance for predictive uncertainty: an entropy-based approach. (arXiv:2310.12842v1 [stat.ML])
    In order to trust the predictions of a machine learning algorithm, it is necessary to understand the factors that contribute to those predictions. In the case of probabilistic and uncertainty-aware models, it is necessary to understand not only the reasons for the predictions themselves, but also the model's level of confidence in those predictions. In this paper, we show how existing methods in explainability can be extended to uncertainty-aware models and how such extensions can be used to understand the sources of uncertainty in a model's predictive distribution. In particular, by adapting permutation feature importance, partial dependence plots, and individual conditional expectation plots, we demonstrate that novel insights into model behaviour may be obtained and that these methods can be used to measure the impact of features on both the entropy of the predictive distribution and the log-likelihood of the ground truth labels under that distribution. With experiments using both synthetic and real-world data, we demonstrate the utility of these approaches in understanding both the sources of uncertainty and their impact on model performance.
    Neural networks with linear threshold activations: structure and algorithms. (arXiv:2111.08117v4 [cs.LG] UPDATED)
    In this article we present new results on neural networks with linear threshold activation functions. We precisely characterize the class of functions that are representable by such neural networks and show that 2 hidden layers are necessary and sufficient to represent any function representable in the class. This is a surprising result in the light of recent exact representability investigations for neural networks using other popular activation functions like rectified linear units (ReLU). We also give precise bounds on the sizes of the neural networks required to represent any function in the class. Finally, we design an algorithm to solve the empirical risk minimization (ERM) problem to global optimality for these neural networks with a fixed architecture. The algorithm's running time is polynomial in the size of the data sample, if the input dimension and the size of the network architecture are considered fixed constants. The algorithm is unique in the sense that it works for any architecture with any number of layers, whereas previous polynomial time globally optimal algorithms work only for very restricted classes of architectures. Using these insights, we propose a new class of neural networks that we call shortcut linear threshold networks. To the best of our knowledge, this way of designing neural networks has not been explored before in the literature. We show that these neural networks have several desirable theoretical properties.
    Generating collective counterfactual explanations in score-based classification via mathematical optimization. (arXiv:2310.12822v1 [stat.ML])
    Due to the increasing use of Machine Learning models in high stakes decision making settings, it has become increasingly important to have tools to understand how models arrive at decisions. Assuming a trained Supervised Classification model, explanations can be obtained via counterfactual analysis: a counterfactual explanation of an instance indicates how this instance should be minimally modified so that the perturbed instance is classified in the desired class by the Machine Learning classification model. Most of the Counterfactual Analysis literature focuses on the single-instance single-counterfactual setting, in which the analysis is done for one single instance to provide one single explanation. Taking a stakeholder's perspective, in this paper we introduce the so-called collective counterfactual explanations. By means of novel Mathematical Optimization models, we provide a counterfactual explanation for each instance in a group of interest, so that the total cost of the perturbations is minimized under some linking constraints. Making the process of constructing counterfactuals collective instead of individual enables us to detect the features that are critical to the entire dataset to have the individuals classified in the desired class. Our methodology allows for some instances to be treated individually, performing the collective counterfactual analysis for a fraction of records of the group of interest. This way, outliers are identified and handled appropriately. Under some assumptions on the classifier and the space in which counterfactuals are sought, finding collective counterfactuals is reduced to solving a convex quadratic linearly constrained mixed integer optimization problem, which, for datasets of moderate size, can be solved to optimality using existing solvers. The performance of our approach is illustrated on real-world datasets, demonstrating its usefulness.
    Hierarchical Forecasting at Scale. (arXiv:2310.12809v1 [cs.LG])
    Existing hierarchical forecasting techniques scale poorly when the number of time series increases. We propose to learn a coherent forecast for millions of time series with a single bottom-level forecast model by using a sparse loss function that directly optimizes the hierarchical product and/or temporal structure. The benefit of our sparse hierarchical loss function is that it provides practitioners a method of producing bottom-level forecasts that are coherent to any chosen cross-sectional or temporal hierarchy. In addition, removing the need for a post-processing step as required in traditional hierarchical forecasting techniques reduces the computational cost of the prediction phase in the forecasting pipeline. On the public M5 dataset, our sparse hierarchical loss function performs up to 10% (RMSE) better compared to the baseline loss function. We implement our sparse hierarchical loss function within an existing forecasting model at bol, a large European e-commerce platform, resulting in an improved forecasting performance of 2% at the product level. Finally, we found an increase in forecasting performance of about 5-10% when evaluating the forecasting performance across the cross-sectional hierarchies that we defined. These results demonstrate the usefulness of our sparse hierarchical loss applied to a production forecasting system at a major e-commerce platform.
    A Theoretical Approach to Characterize the Accuracy-Fairness Trade-off Pareto Frontier. (arXiv:2310.12785v1 [cs.LG])
    While the accuracy-fairness trade-off has been frequently observed in the literature of fair machine learning, rigorous theoretical analyses have been scarce. To demystify this long-standing challenge, this work seeks to develop a theoretical framework by characterizing the shape of the accuracy-fairness trade-off Pareto frontier (FairFrontier), determined by a set of all optimal Pareto classifiers that no other classifiers can dominate. Specifically, we first demonstrate the existence of the trade-off in real-world scenarios and then propose four potential categories to characterize the important properties of the accuracy-fairness Pareto frontier. For each category, we identify the necessary conditions that lead to corresponding trade-offs. Experimental results on synthetic data suggest insightful findings of the proposed framework: (1) When sensitive attributes can be fully interpreted by non-sensitive attributes, FairFrontier is mostly continuous. (2) Accuracy can suffer a \textit{sharp} decline when over-pursuing fairness. (3) Eliminate the trade-off via a two-step streamlined approach. The proposed research enables an in-depth understanding of the accuracy-fairness trade-off, pushing current fair machine-learning research to a new frontier.
    Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair. (arXiv:2309.00608v2 [cs.SE] UPDATED)
    During Automated Program Repair (APR), it can be challenging to synthesize correct patches for real-world systems in general-purpose programming languages. Recent Large Language Models (LLMs) have been shown to be helpful "copilots" in assisting developers with various coding tasks, and have also been directly applied for patch synthesis. However, most LLMs treat programs as sequences of tokens, meaning that they are ignorant of the underlying semantics constraints of the target programming language. This results in plenty of statically invalid generated patches, impeding the practicality of the technique. Therefore, we propose Repilot, a framework to further copilot the AI "copilots" (i.e., LLMs) by synthesizing more valid patches during the repair process. Our key insight is that many LLMs produce outputs autoregressively (i.e., token by token), resembling human writing programs, which can be significantly boosted and guided through a Completion Engine. Repilot synergistically synthesizes a candidate patch through the interaction between an LLM and a Completion Engine, which 1) prunes away infeasible tokens suggested by the LLM and 2) proactively completes the token based on the suggestions provided by the Completion Engine. Our evaluation on a subset of the widely-used Defects4j 1.2 and 2.0 datasets shows that Repilot fixes 66 and 50 bugs, respectively, surpassing the best-performing baseline by 14 and 16 bugs fixed. More importantly, Repilot is capable of producing more valid and correct patches than the base LLM when given the same generation budget.
    Bayesian tomography using polynomial chaos expansion and deep generative networks. (arXiv:2307.04228v4 [physics.geo-ph] UPDATED)
    Implementations of Markov chain Monte Carlo (MCMC) methods need to confront two fundamental challenges: accurate representation of prior information and efficient evaluation of likelihoods. Principal component analysis (PCA) and related techniques can in some cases facilitate the definition and sampling of the prior distribution, as well as the training of accurate surrogate models, using for instance, polynomial chaos expansion (PCE). However, complex geological priors with sharp contrasts necessitate more complex dimensionality-reduction techniques, such as, deep generative models (DGMs). By sampling a low-dimensional prior probability distribution defined in the low-dimensional latent space of such a model, it becomes possible to efficiently sample the physical domain at the price of a generator that is typically highly non-linear. Training a surrogate that is capable of capturing intricate non-linear relationships between latent parameters and outputs of forward modeling presents a notable challenge. Indeed, while PCE models provide high accuracy when the input-output relationship can be effectively approximated by relatively low-degree multivariate polynomials, this condition is typically not met when employing latent variables derived from DGMs. In this contribution, we present a strategy combining the excellent reconstruction performances of a variational autoencoder (VAE) with the accuracy of PCA-PCE surrogate modeling in the context of Bayesian ground penetrating radar (GPR) traveltime tomography. Within the MCMC process, the parametrization of the VAE is leveraged for prior exploration and sample proposals. Concurrently, surrogate modeling is conducted using PCE, which operates on either globally or locally defined principal components of the VAE samples under examination.
    Recurrent Neural Language Models as Probabilistic Finite-state Automata. (arXiv:2310.05161v2 [cs.CL] UPDATED)
    Studying language models (LMs) in terms of well-understood formalisms allows us to precisely characterize their abilities and limitations. Previous work has investigated the representational capacity of recurrent neural network (RNN) LMs in terms of their capacity to recognize unweighted formal languages. However, LMs do not describe unweighted formal languages -- rather, they define probability distributions over strings. In this work, we study what classes of such probability distributions RNN LMs can represent, which allows us to make more direct statements about their capabilities. We show that simple RNNs are equivalent to a subclass of probabilistic finite-state automata, and can thus model a strict subset of probability distributions expressible by finite-state models. Furthermore, we study the space complexity of representing finite-state LMs with RNNs. We show that, to represent an arbitrary deterministic finite-state LM with $N$ states over an alphabet $\Sigma$, an RNN requires $\Omega\left(N |\Sigma|\right)$ neurons. These results present a first step towards characterizing the classes of distributions RNN LMs can represent and thus help us understand their capabilities and limitations.
    Neurosymbolic Grounding for Compositional World Models. (arXiv:2310.12690v1 [cs.LG])
    We introduce Cosmos, a framework for object-centric world modeling that is designed for compositional generalization (CG), i.e., high performance on unseen input scenes obtained through the composition of known visual "atoms." The central insight behind Cosmos is the use of a novel form of neurosymbolic grounding. Specifically, the framework introduces two new tools: (i) neurosymbolic scene encodings, which represent each entity in a scene using a real vector computed using a neural encoder, as well as a vector of composable symbols describing attributes of the entity, and (ii) a neurosymbolic attention mechanism that binds these entities to learned rules of interaction. Cosmos is end-to-end differentiable; also, unlike traditional neurosymbolic methods that require representations to be manually mapped to symbols, it computes an entity's symbolic attributes using vision-language foundation models. Through an evaluation that considers two different forms of CG on an established blocks-pushing domain, we show that the framework establishes a new state-of-the-art for CG in world modeling.
    Rank-DETR for High Quality Object Detection. (arXiv:2310.08854v2 [cs.CV] UPDATED)
    Modern detection transformers (DETRs) use a set of object queries to predict a list of bounding boxes, sort them by their classification confidence scores, and select the top-ranked predictions as the final detection results for the given input image. A highly performant object detector requires accurate ranking for the bounding box predictions. For DETR-based detectors, the top-ranked bounding boxes suffer from less accurate localization quality due to the misalignment between classification scores and localization accuracy, thus impeding the construction of high-quality detectors. In this work, we introduce a simple and highly performant DETR-based object detector by proposing a series of rank-oriented designs, combinedly called Rank-DETR. Our key contributions include: (i) a rank-oriented architecture design that can prompt positive predictions and suppress the negative ones to ensure lower false positive rates, as well as (ii) a rank-oriented loss function and matching cost design that prioritizes predictions of more accurate localization accuracy during ranking to boost the AP under high IoU thresholds. We apply our method to improve the recent SOTA methods (e.g., H-DETR and DINO-DETR) and report strong COCO object detection results when using different backbones such as ResNet-$50$, Swin-T, and Swin-L, demonstrating the effectiveness of our approach. Code is available at \url{https://github.com/LeapLabTHU/Rank-DETR}.
    Neural networks for insurance pricing with frequency and severity data: a benchmark study from data preprocessing to technical tariff. (arXiv:2310.12671v1 [cs.LG])
    Insurers usually turn to generalized linear models for modelling claim frequency and severity data. Due to their success in other fields, machine learning techniques are gaining popularity within the actuarial toolbox. Our paper contributes to the literature on frequency-severity insurance pricing with machine learning via deep learning structures. We present a benchmark study on four insurance data sets with frequency and severity targets in the presence of multiple types of input features. We compare in detail the performance of: a generalized linear model on binned input data, a gradient-boosted tree model, a feed-forward neural network (FFNN), and the combined actuarial neural network (CANN). Our CANNs combine a baseline prediction established with a GLM and GBM, respectively, with a neural network correction. We explain the data preprocessing steps with specific focus on the multiple types of input features typically present in tabular insurance data sets, such as postal codes, numeric and categorical covariates. Autoencoders are used to embed the categorical variables into the neural network and we explore their potential advantages in a frequency-severity setting. Finally, we construct global surrogate models for the neural nets' frequency and severity models. These surrogates enable the translation of the essential insights captured by the FFNNs or CANNs to GLMs. As such, a technical tariff table results that can easily be deployed in practice.
    Amazon-M2: A Multilingual Multi-locale Shopping Session Dataset for Recommendation and Text Generation. (arXiv:2307.09688v2 [cs.IR] UPDATED)
    Modeling customer shopping intentions is a crucial task for e-commerce, as it directly impacts user experience and engagement. Thus, accurately understanding customer preferences is essential for providing personalized recommendations. Session-based recommendation, which utilizes customer session data to predict their next interaction, has become increasingly popular. However, existing session datasets have limitations in terms of item attributes, user diversity, and dataset scale. As a result, they cannot comprehensively capture the spectrum of user behaviors and preferences. To bridge this gap, we present the Amazon Multilingual Multi-locale Shopping Session Dataset, namely Amazon-M2. It is the first multilingual dataset consisting of millions of user sessions from six different locales, where the major languages of products are English, German, Japanese, French, Italian, and Spanish. Remarkably, the dataset can help us enhance personalization and understanding of user preferences, which can benefit various existing tasks as well as enable new tasks. To test the potential of the dataset, we introduce three tasks in this work: (1) next-product recommendation, (2) next-product recommendation with domain shifts, and (3) next-product title generation. With the above tasks, we benchmark a range of algorithms on our proposed dataset, drawing new insights for further research and practice. In addition, based on the proposed dataset and tasks, we hosted a competition in the KDD CUP 2023 and have attracted thousands of users and submissions. The winning solutions and the associated workshop can be accessed at our website https://kddcup23.github.io/.
    zkFL: Zero-Knowledge Proof-based Gradient Aggregation for Federated Learning. (arXiv:2310.02554v3 [cs.AI] UPDATED)
    Federated Learning (FL) is a machine learning paradigm, which enables multiple and decentralized clients to collaboratively train a model under the orchestration of a central aggregator. Traditional FL solutions rely on the trust assumption of the centralized aggregator, which forms cohorts of clients in a fair and honest manner. However, a malicious aggregator, in reality, could abandon and replace the client's training models, or launch Sybil attacks to insert fake clients. Such malicious behaviors give the aggregator more power to control clients in the FL setting and determine the final training results. In this work, we introduce zkFL, which leverages zero-knowledge proofs (ZKPs) to tackle the issue of a malicious aggregator during the training model aggregation process. To guarantee the correct aggregation results, the aggregator needs to provide a proof per round. The proof can demonstrate to the clients that the aggregator executes the intended behavior faithfully. To further reduce the verification cost of clients, we employ a blockchain to handle the proof in a zero-knowledge way, where miners (i.e., the nodes validating and maintaining the blockchain data) can verify the proof without knowing the clients' local and aggregated models. The theoretical analysis and empirical results show that zkFL can achieve better security and privacy than traditional FL, without modifying the underlying FL network structure or heavily compromising the training speed.
    An effective theory of collective deep learning. (arXiv:2310.12802v1 [physics.soc-ph])
    Unraveling the emergence of collective learning in systems of coupled artificial neural networks is an endeavor with broader implications for physics, machine learning, neuroscience and society. Here we introduce a minimal model that condenses several recent decentralized algorithms by considering a competition between two terms: the local learning dynamics in the parameters of each neural network unit, and a diffusive coupling among units that tends to homogenize the parameters of the ensemble. We derive the coarse-grained behavior of our model via an effective theory for linear networks that we show is analogous to a deformed Ginzburg-Landau model with quenched disorder. This framework predicts (depth-dependent) disorder-order-disorder phase transitions in the parameters' solutions that reveal the onset of a collective learning phase, along with a depth-induced delay of the critical point and a robust shape of the microscopic learning path. We validate our theory in realistic ensembles of coupled nonlinear networks trained in the MNIST dataset under privacy constraints. Interestingly, experiments confirm that individual networks -- trained only with private data -- can fully generalize to unseen data classes when the collective learning phase emerges. Our work elucidates the physics of collective learning and contributes to the mechanistic interpretability of deep learning in decentralized settings.
    Provably Powerful Graph Neural Networks for Directed Multigraphs. (arXiv:2306.11586v2 [cs.LG] UPDATED)
    This paper analyses a set of simple adaptations that transform standard message-passing Graph Neural Networks (GNN) into provably powerful directed multigraph neural networks. The adaptations include multigraph port numbering, ego IDs, and reverse message passing. We prove that the combination of these theoretically enables the detection of any directed subgraph pattern. To validate the effectiveness of our proposed adaptations in practice, we conduct experiments on synthetic subgraph detection tasks, which demonstrate outstanding performance with almost perfect results. Moreover, we apply our proposed adaptations to two financial crime analysis tasks. We observe dramatic improvements in detecting money laundering transactions, improving the minority-class F1 score of a standard message-passing GNN by up to 30%, and closely matching or outperforming tree-based and GNN baselines. Similarly impressive results are observed on a real-world phishing detection dataset, boosting three standard GNNs' F1 scores by around 15% and outperforming all baselines.
    Neural Likelihood Approximation for Integer Valued Time Series Data. (arXiv:2310.12544v1 [stat.ML])
    Stochastic processes defined on integer valued state spaces are popular within the physical and biological sciences. These models are necessary for capturing the dynamics of small systems where the individual nature of the populations cannot be ignored and stochastic effects are important. The inference of the parameters of such models, from time series data, is difficult due to intractability of the likelihood; current methods, based on simulations of the underlying model, can be so computationally expensive as to be prohibitive. In this paper we construct a neural likelihood approximation for integer valued time series data using causal convolutions, which allows us to evaluate the likelihood of the whole time series in parallel. We demonstrate our method by performing inference on a number of ecological and epidemiological models, showing that we can accurately approximate the true posterior while achieving significant computational speed ups in situations where current methods struggle.
    Inverse Renormalization Group of Disordered Systems. (arXiv:2310.12631v1 [cond-mat.stat-mech])
    We propose inverse renormalization group transformations to construct approximate configurations for lattice volumes that have not yet been accessed by supercomputers or large-scale simulations in the study of spin glasses. Specifically, starting from lattices of volume $V=8^{3}$ in the case of the three-dimensional Edwards-Anderson model we employ machine learning algorithms to construct rescaled lattices up to $V'=128^{3}$, which we utilize to extract two critical exponents. We conclude by discussing how to incorporate numerical exactness within inverse renormalization group approaches of disordered systems, thus opening up the opportunity to explore a sustainable and energy-efficient generation of exact configurations for increasing lattice volumes without the use of dedicated supercomputers.
    Compression of Recurrent Neural Networks using Matrix Factorization. (arXiv:2310.12688v1 [cs.LG])
    Compressing neural networks is a key step when deploying models for real-time or embedded applications. Factorizing the model's matrices using low-rank approximations is a promising method for achieving compression. While it is possible to set the rank before training, this approach is neither flexible nor optimal. In this work, we propose a post-training rank-selection method called Rank-Tuning that selects a different rank for each matrix. Used in combination with training adaptations, our method achieves high compression rates with no or little performance degradation. Our numerical experiments on signal processing tasks show that we can compress recurrent neural networks up to 14x with at most 1.4% relative performance reduction.
    Networkwide Traffic State Forecasting Using Exogenous Information: A Multi-Dimensional Graph Attention-Based Approach. (arXiv:2310.12353v1 [cs.LG])
    Traffic state forecasting is crucial for traffic management and control strategies, as well as user- and system-level decision making in the transportation network. While traffic forecasting has been approached with a variety of techniques over the last couple of decades, most approaches simply rely on endogenous traffic variables for state prediction, despite the evidence that exogenous factors can significantly impact traffic conditions. This paper proposes a multi-dimensional spatio-temporal graph attention-based traffic prediction approach (M-STGAT), which predicts traffic based on past observations of speed, along with lane closure events, temperature, and visibility across the transportation network. The approach is based on a graph attention network architecture, which also learns based on the structure of the transportation network on which these variables are observed. Numerical experiments are performed using traffic speed and lane closure data from the California Department of Transportation (Caltrans) Performance Measurement System (PeMS). The corresponding weather data were downloaded from the National Oceanic and Atmospheric Administration (NOOA) Automated Surface Observing Systems (ASOS). For comparison, the numerical experiments implement three alternative models which do not allow for the multi-dimensional input. The M-STGAT is shown to outperform the three alternative models, when performing tests using our primary data set for prediction with a 30-, 45-, and 60-minute prediction horizon, in terms of three error measures: Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE). However, the model's transferability can vary for different transfer data sets and this aspect may require further investigation.
    Time-Aware Representation Learning for Time-Sensitive Question Answering. (arXiv:2310.12585v1 [cs.CL])
    Time is one of the crucial factors in real-world question answering (QA) problems. However, language models have difficulty understanding the relationships between time specifiers, such as 'after' and 'before', and numbers, since existing QA datasets do not include sufficient time expressions. To address this issue, we propose a Time-Context aware Question Answering (TCQA) framework. We suggest a Time-Context dependent Span Extraction (TCSE) task, and build a time-context dependent data generation framework for model training. Moreover, we present a metric to evaluate the time awareness of the QA model using TCSE. The TCSE task consists of a question and four sentence candidates classified as correct or incorrect based on time and context. The model is trained to extract the answer span from the sentence that is both correct in time and context. The model trained with TCQA outperforms baseline models up to 8.5 of the F1-score in the TimeQA dataset. Our dataset and code are available at https://github.com/sonjbin/TCQA
    Voyager: An Open-Ended Embodied Agent with Large Language Models. (arXiv:2305.16291v2 [cs.AI] UPDATED)
    We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. Voyager consists of three key components: 1) an automatic curriculum that maximizes exploration, 2) an ever-growing skill library of executable code for storing and retrieving complex behaviors, and 3) a new iterative prompting mechanism that incorporates environment feedback, execution errors, and self-verification for program improvement. Voyager interacts with GPT-4 via blackbox queries, which bypasses the need for model parameter fine-tuning. The skills developed by Voyager are temporally extended, interpretable, and compositional, which compounds the agent's abilities rapidly and alleviates catastrophic forgetting. Empirically, Voyager shows strong in-context lifelong learning capability and exhibits exceptional proficiency in playing Minecraft. It obtains 3.3x more unique items, travels 2.3x longer distances, and unlocks key tech tree milestones up to 15.3x faster than prior SOTA. Voyager is able to utilize the learned skill library in a new Minecraft world to solve novel tasks from scratch, while other techniques struggle to generalize. We open-source our full codebase and prompts at https://voyager.minedojo.org/.
    An ML-assisted OTFS vs. OFDM adaptable modem. (arXiv:2309.01319v2 [eess.SP] UPDATED)
    The Orthogonal-Time-Frequency-Space (OTFS) signaling is known to be resilient to doubly-dispersive channels, which impacts high mobility scenarios. On the other hand, the Orthogonal-Frequency-Division-Multiplexing (OFDM) waveforms enjoy the benefits of the reuse of legacy architectures, simplicity of receiver design, and low-complexity detection. Several studies that compare the performance of OFDM and OTFS have indicated mixed outcomes due to the plethora of system parameters at play beyond high-mobility conditions. In this work, we exemplify this observation using simulations and propose a deep neural network (DNN)-based adaptation scheme to switch between using either an OTFS or OFDM signal processing chain at the transmitter and receiver for optimal mean-squared-error (MSE) performance. The DNN classifier is trained to switch between the two schemes by observing the channel condition, received SNR, and modulation format. We compare the performance of the OTFS, OFDM, and the proposed switched-waveform scheme. The simulations indicate superior performance with the proposed scheme with a well-trained DNN, thus improving the MSE performance of the communication significantly.
    Detection and Evaluation of bias-inducing Features in Machine learning. (arXiv:2310.12805v1 [cs.LG])
    The cause-to-effect analysis can help us decompose all the likely causes of a problem, such as an undesirable business situation or unintended harm to the individual(s). This implies that we can identify how the problems are inherited, rank the causes to help prioritize fixes, simplify a complex problem and visualize them. In the context of machine learning (ML), one can use cause-to-effect analysis to understand the reason for the biased behavior of the system. For example, we can examine the root causes of biases by checking each feature for a potential cause of bias in the model. To approach this, one can apply small changes to a given feature or a pair of features in the data, following some guidelines and observing how it impacts the decision made by the model (i.e., model prediction). Therefore, we can use cause-to-effect analysis to identify the potential bias-inducing features, even when these features are originally are unknown. This is important since most current methods require a pre-identification of sensitive features for bias assessment and can actually miss other relevant bias-inducing features, which is why systematic identification of such features is necessary. Moreover, it often occurs that to achieve an equitable outcome, one has to take into account sensitive features in the model decision. Therefore, it should be up to the domain experts to decide based on their knowledge of the context of a decision whether bias induced by specific features is acceptable or not. In this study, we propose an approach for systematically identifying all bias-inducing features of a model to help support the decision-making of domain experts. We evaluated our technique using four well-known datasets to showcase how our contribution can help spearhead the standard procedure when developing, testing, maintaining, and deploying fair/equitable machine learning systems.
    Gradient Descent Fails to Learn High-frequency Functions and Modular Arithmetic. (arXiv:2310.12660v1 [cs.LG])
    Classes of target functions containing a large number of approximately orthogonal elements are known to be hard to learn by the Statistical Query algorithms. Recently this classical fact re-emerged in a theory of gradient-based optimization of neural networks. In the novel framework, the hardness of a class is usually quantified by the variance of the gradient with respect to a random choice of a target function. A set of functions of the form $x\to ax \bmod p$, where $a$ is taken from ${\mathbb Z}_p$, has attracted some attention from deep learning theorists and cryptographers recently. This class can be understood as a subset of $p$-periodic functions on ${\mathbb Z}$ and is tightly connected with a class of high-frequency periodic functions on the real line. We present a mathematical analysis of limitations and challenges associated with using gradient-based learning techniques to train a high-frequency periodic function or modular multiplication from examples. We highlight that the variance of the gradient is negligibly small in both cases when either a frequency or the prime base $p$ is large. This in turn prevents such a learning algorithm from being successful.
    Red Teaming Language Model Detectors with Language Models. (arXiv:2305.19713v2 [cs.CL] UPDATED)
    The prevalence and strong capability of large language models (LLMs) present significant safety and ethical risks if exploited by malicious users. To prevent the potentially deceptive usage of LLMs, recent works have proposed algorithms to detect LLM-generated text and protect LLMs. In this paper, we investigate the robustness and reliability of these LLM detectors under adversarial attacks. We study two types of attack strategies: 1) replacing certain words in an LLM's output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation. In both strategies, we leverage an auxiliary LLM to generate the word replacements or the instructional prompt. Different from previous works, we consider a challenging setting where the auxiliary LLM can also be protected by a detector. Experiments reveal that our attacks effectively compromise the performance of all detectors in the study with plausible generations, underscoring the urgent need to improve the robustness of LLM-generated text detection systems.
    Test-Time Distribution Normalization for Contrastively Learned Vision-language Models. (arXiv:2302.11084v2 [cs.LG] UPDATED)
    Advances in the field of vision-language contrastive learning have made it possible for many downstream applications to be carried out efficiently and accurately by simply taking the dot product between image and text representations. One of the most representative approaches proposed recently known as CLIP has garnered widespread adoption due to its effectiveness. CLIP is trained with an InfoNCE loss that takes into account both positive and negative samples to help learn a much more robust representation space. This paper reveals that the common downstream practice of taking a dot product is only a zeroth-order approximation of the optimization goal, resulting in a loss of information during test-time. Intuitively, since the model has been optimized based on the InfoNCE loss, test-time procedures should also be in alignment. The question lies in how one can retrieve any semblance of negative samples information during inference in a computationally efficient way. To this end, we propose Distribution Normalization (DN), where we approximate the mean representation of a batch of test samples and use such a mean to represent what would be analogous to negative samples in the InfoNCE loss. DN requires no retraining or fine-tuning and can be effortlessly applied during inference. Extensive experiments on a wide variety of downstream tasks exhibit a clear advantage of DN over the dot product on top of other existing test-time augmentation methods.
    Post-processing Private Synthetic Data for Improving Utility on Selected Measures. (arXiv:2305.15538v2 [cs.LG] UPDATED)
    Existing private synthetic data generation algorithms are agnostic to downstream tasks. However, end users may have specific requirements that the synthetic data must satisfy. Failure to meet these requirements could significantly reduce the utility of the data for downstream use. We introduce a post-processing technique that improves the utility of the synthetic data with respect to measures selected by the end user, while preserving strong privacy guarantees and dataset quality. Our technique involves resampling from the synthetic data to filter out samples that do not meet the selected utility measures, using an efficient stochastic first-order algorithm to find optimal resampling weights. Through comprehensive numerical experiments, we demonstrate that our approach consistently improves the utility of synthetic data across multiple benchmark datasets and state-of-the-art synthetic data generation algorithms.
    Physics-informed neural networks in the recreation of hydrodynamic simulations from dark matter. (arXiv:2303.14090v2 [astro-ph.CO] UPDATED)
    Physics-informed neural networks have emerged as a coherent framework for building predictive models that combine statistical patterns with domain knowledge. The underlying notion is to enrich the optimization loss function with known relationships to constrain the space of possible solutions. Hydrodynamic simulations are a core constituent of modern cosmology, while the required computations are both expensive and time-consuming. At the same time, the comparatively fast simulation of dark matter requires fewer resources, which has led to the emergence of machine learning algorithms for baryon inpainting as an active area of research; here, recreating the scatter found in hydrodynamic simulations is an ongoing challenge. This paper presents the first application of physics-informed neural networks to baryon inpainting by combining advances in neural network architectures with physical constraints, injecting theory on baryon conversion efficiency into the model loss function. We also introduce a punitive prediction comparison based on the Kullback-Leibler divergence, which enforces scatter reproduction. By simultaneously extracting the complete set of baryonic properties for the Simba suite of cosmological simulations, our results demonstrate improved accuracy of baryonic predictions based on dark matter halo properties, successful recovery of the fundamental metallicity relation, and retrieve scatter that traces the target simulation's distribution.
    Towards a Deep Learning-based Online Quality Prediction System for Welding Processes. (arXiv:2310.12632v1 [cs.LG])
    The digitization of manufacturing processes enables promising applications for machine learning-assisted quality assurance. A widely used manufacturing process that can strongly benefit from data-driven solutions is \ac{GMAW}. The welding process is characterized by complex cause-effect relationships between material properties, process conditions and weld quality. In non-laboratory environments with frequently changing process parameters, accurate determination of weld quality by destructive testing is economically unfeasible. Deep learning offers the potential to identify the relationships in available process data and predict the weld quality from process observations. In this paper, we present a concept for a deep learning based predictive quality system in \ac{GMAW}. At its core, the concept involves a pipeline consisting of four major phases: collection and management of multi-sensor data (e.g. current and voltage), real-time processing and feature engineering of the time series data by means of autoencoders, training and deployment of suitable recurrent deep learning models for quality predictions, and model evolutions under changing process conditions using continual learning. The concept provides the foundation for future research activities in which we will realize an online predictive quality system for running production.
    Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark. (arXiv:2310.12567v1 [cs.AI])
    Artificial intelligence (AI) systems possess significant potential to drive societal progress. However, their deployment often faces obstacles due to substantial safety concerns. Safe reinforcement learning (SafeRL) emerges as a solution to optimize policies while simultaneously adhering to multiple constraints, thereby addressing the challenge of integrating reinforcement learning in safety-critical scenarios. In this paper, we present an environment suite called Safety-Gymnasium, which encompasses safety-critical tasks in both single and multi-agent scenarios, accepting vector and vision-only input. Additionally, we offer a library of algorithms named Safe Policy Optimization (SafePO), comprising 16 state-of-the-art SafeRL algorithms. This comprehensive library can serve as a validation tool for the research community. By introducing this benchmark, we aim to facilitate the evaluation and comparison of safety performance, thus fostering the development of reinforcement learning for safer, more reliable, and responsible real-world applications. The website of this project can be accessed at https://sites.google.com/view/safety-gymnasium.
    Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study. (arXiv:2304.06762v2 [cs.CL] UPDATED)
    Large decoder-only language models (LMs) can be largely improved in terms of perplexity by retrieval (e.g., RETRO), but its impact on text generation quality and downstream task accuracy is unclear. Thus, it is still an open question: shall we pretrain large autoregressive LMs with retrieval? To answer it, we perform a comprehensive study on a scalable pre-trained retrieval-augmented LM (i.e., RETRO) compared with standard GPT and retrieval-augmented GPT incorporated at fine-tuning or inference stages. We first provide the recipe to reproduce RETRO up to 9.5B parameters while retrieving a text corpus with 330B tokens. Based on that, we have the following novel findings: i) RETRO outperforms GPT on text generation with much less degeneration (i.e., repetition), moderately higher factual accuracy, and slightly lower toxicity with a nontoxic retrieval database. ii) On the LM Evaluation Harness benchmark, RETRO largely outperforms GPT on knowledge-intensive tasks, but is on par with GPT on other tasks. Furthermore, we introduce a simple variant of the model, RETRO++, which largely improves open-domain QA results of original RETRO (e.g., EM score +8.6 on Natural Question) and significantly outperforms retrieval-augmented GPT in both fine-tuning and zero-shot evaluation settings. Our findings highlight the promising direction of pretraining autoregressive LMs with retrieval as future foundation models. We release our implementation at: https://github.com/NVIDIA/Megatron-LM#retro.
    REVAMP: Automated Simulations of Adversarial Attacks on Arbitrary Objects in Realistic Scenes. (arXiv:2310.12243v1 [cs.LG])
    Deep Learning models, such as those used in an autonomous vehicle are vulnerable to adversarial attacks where an attacker could place an adversarial object in the environment, leading to mis-classification. Generating these adversarial objects in the digital space has been extensively studied, however successfully transferring these attacks from the digital realm to the physical realm has proven challenging when controlling for real-world environmental factors. In response to these limitations, we introduce REVAMP, an easy-to-use Python library that is the first-of-its-kind tool for creating attack scenarios with arbitrary objects and simulating realistic environmental factors, lighting, reflection, and refraction. REVAMP enables researchers and practitioners to swiftly explore various scenarios within the digital realm by offering a wide range of configurable options for designing experiments and using differentiable rendering to reproduce physically plausible adversarial objects. We will demonstrate and invite the audience to try REVAMP to produce an adversarial texture on a chosen object while having control over various scene parameters. The audience will choose a scene, an object to attack, the desired attack class, and the number of camera positions to use. Then, in real time, we show how this altered texture causes the chosen object to be mis-classified, showcasing the potential of REVAMP in real-world scenarios. REVAMP is open-source and available at https://github.com/poloclub/revamp.
    Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models. (arXiv:2304.12526v2 [cs.CV] UPDATED)
    Diffusion models are powerful, but they require a lot of time and data to train. We propose Patch Diffusion, a generic patch-wise training framework, to significantly reduce the training time costs while improving data efficiency, which thus helps democratize diffusion model training to broader users. At the core of our innovations is a new conditional score function at the patch level, where the patch location in the original image is included as additional coordinate channels, while the patch size is randomized and diversified throughout training to encode the cross-region dependency at multiple scales. Sampling with our method is as easy as in the original diffusion model. Through Patch Diffusion, we could achieve $\mathbf{\ge 2\times}$ faster training, while maintaining comparable or better generation quality. Patch Diffusion meanwhile improves the performance of diffusion models trained on relatively small datasets, $e.g.$, as few as 5,000 images to train from scratch. We achieve outstanding FID scores in line with state-of-the-art benchmarks: 1.77 on CelebA-64$\times$64, 1.93 on AFHQv2-Wild-64$\times$64, and 2.72 on ImageNet-256$\times$256. We share our code and pre-trained models at https://github.com/Zhendong-Wang/Patch-Diffusion.
    On the Optimization and Generalization of Multi-head Attention. (arXiv:2310.12680v1 [cs.LG])
    The training and generalization dynamics of the Transformer's core mechanism, namely the Attention mechanism, remain under-explored. Besides, existing analyses primarily focus on single-head attention. Inspired by the demonstrated benefits of overparameterization when training fully-connected networks, we investigate the potential optimization and generalization advantages of using multiple attention heads. Towards this goal, we derive convergence and generalization guarantees for gradient-descent training of a single-layer multi-head self-attention model, under a suitable realizability condition on the data. We then establish primitive conditions on the initialization that ensure realizability holds. Finally, we demonstrate that these conditions are satisfied for a simple tokenized-mixture model. We expect the analysis can be extended to various data-model and architecture variations.
    A PAC Learning Algorithm for LTL and Omega-regular Objectives in MDPs. (arXiv:2310.12248v1 [cs.LG])
    Linear temporal logic (LTL) and omega-regular objectives -- a superset of LTL -- have seen recent use as a way to express non-Markovian objectives in reinforcement learning. We introduce a model-based probably approximately correct (PAC) learning algorithm for omega-regular objectives in Markov decision processes. Unlike prior approaches, our algorithm learns from sampled trajectories of the system and does not require prior knowledge of the system's topology.
    Personalized human mobility prediction for HuMob challenge. (arXiv:2310.12900v1 [cs.LG])
    We explain the methodology used to create the data submitted to HuMob Challenge, a data analysis competition for human mobility prediction. We adopted a personalized model to predict the individual's movement trajectory from their data, instead of predicting from the overall movement, based on the hypothesis that human movement is unique to each person. We devised the features such as the date and time, activity time, days of the week, time of day, and frequency of visits to POI (Point of Interest). As additional features, we incorporated the movement of other individuals with similar behavior patterns through the employment of clustering. The machine learning model we adopted was the Support Vector Regression (SVR). We performed accuracy through offline assessment and carried out feature selection and parameter tuning. Although overall dataset provided consists of 100,000 users trajectory, our method use only 20,000 target users data, and do not need to use other 80,000 data. Despite the personalized model's traditional feature engineering approach, this model yields reasonably good accuracy with lower computational cost.
    Denoising Heat-inspired Diffusion with Insulators for Collision Free Motion Planning. (arXiv:2310.12609v1 [cs.RO])
    Diffusion models have risen as a powerful tool in robotics due to their flexibility and multi-modality. While some of these methods effectively address complex problems, they often depend heavily on inference-time obstacle detection and require additional equipment. Addressing these challenges, we present a method that, during inference time, simultaneously generates only reachable goals and plans motions that avoid obstacles, all from a single visual input. Central to our approach is the novel use of a collision-avoiding diffusion kernel for training. Through evaluations against behavior-cloning and classical diffusion models, our framework has proven its robustness. It is particularly effective in multi-modal environments, navigating toward goals and avoiding unreachable ones blocked by obstacles, while ensuring collision avoidance.
    Towards Better Dynamic Graph Learning: New Architecture and Unified Library. (arXiv:2303.13047v3 [cs.LG] UPDATED)
    We propose DyGFormer, a new Transformer-based architecture for dynamic graph learning. DyGFormer is conceptually simple and only needs to learn from nodes' historical first-hop interactions by: (1) a neighbor co-occurrence encoding scheme that explores the correlations of the source node and destination node based on their historical sequences; (2) a patching technique that divides each sequence into multiple patches and feeds them to Transformer, allowing the model to effectively and efficiently benefit from longer histories. We also introduce DyGLib, a unified library with standard training pipelines, extensible coding interfaces, and comprehensive evaluating protocols to promote reproducible, scalable, and credible dynamic graph learning research. By performing exhaustive experiments on thirteen datasets for dynamic link prediction and dynamic node classification tasks, we find that DyGFormer achieves state-of-the-art performance on most of the datasets, demonstrating its effectiveness in capturing nodes' correlations and long-term temporal dependencies. Moreover, some results of baselines are inconsistent with previous reports, which may be caused by their diverse but less rigorous implementations, showing the importance of DyGLib. All the used resources are publicly available at https://github.com/yule-BUAA/DyGLib.
    Stochastic Average Gradient : A Simple Empirical Investigation. (arXiv:2310.12771v1 [cs.LG])
    Despite the recent growth of theoretical studies and empirical successes of neural networks, gradient backpropagation is still the most widely used algorithm for training such networks. On the one hand, we have deterministic or full gradient (FG) approaches that have a cost proportional to the amount of training data used but have a linear convergence rate, and on the other hand, stochastic gradient (SG) methods that have a cost independent of the size of the dataset, but have a less optimal convergence rate than the determinist approaches. To combine the cost of the stochastic approach with the convergence rate of the deterministic approach, a stochastic average gradient (SAG) has been proposed. SAG is a method for optimizing the sum of a finite number of smooth convex functions. Like SG methods, the SAG method's iteration cost is independent of the number of terms in the sum. In this work, we propose to compare SAG to some standard optimizers used in machine learning. SAG converges faster than other optimizers on simple toy problems and performs better than many other optimizers on simple machine learning problems. We also propose a combination of SAG with the momentum algorithm and Adam. These combinations allow empirically higher speed and obtain better performance than the other methods, especially when the landscape of the function to optimize presents obstacles or is ill-conditioned.
    Approximate information maximization for bandit games. (arXiv:2310.12563v1 [stat.ML])
    Entropy maximization and free energy minimization are general physical principles for modeling the dynamics of various physical systems. Notable examples include modeling decision-making within the brain using the free-energy principle, optimizing the accuracy-complexity trade-off when accessing hidden variables with the information bottleneck principle (Tishby et al., 2000), and navigation in random environments using information maximization (Vergassola et al., 2007). Built on this principle, we propose a new class of bandit algorithms that maximize an approximation to the information of a key variable within the system. To this end, we develop an approximated analytical physics-based representation of an entropy to forecast the information gain of each action and greedily choose the one with the largest information gain. This method yields strong performances in classical bandit settings. Motivated by its empirical success, we prove its asymptotic optimality for the two-armed bandit problem with Gaussian rewards. Owing to its ability to encompass the system's properties in a global physical functional, this approach can be efficiently adapted to more complex bandit settings, calling for further investigation of information maximization approaches for multi-armed bandit problems.
    Testing the Consistency of Performance Scores Reported for Binary Classification Problems. (arXiv:2310.12527v1 [cs.LG])
    Binary classification is a fundamental task in machine learning, with applications spanning various scientific domains. Whether scientists are conducting fundamental research or refining practical applications, they typically assess and rank classification techniques based on performance metrics such as accuracy, sensitivity, and specificity. However, reported performance scores may not always serve as a reliable basis for research ranking. This can be attributed to undisclosed or unconventional practices related to cross-validation, typographical errors, and other factors. In a given experimental setup, with a specific number of positive and negative test items, most performance scores can assume specific, interrelated values. In this paper, we introduce numerical techniques to assess the consistency of reported performance scores and the assumed experimental setup. Importantly, the proposed approach does not rely on statistical inference but uses numerical methods to identify inconsistencies with certainty. Through three different applications related to medicine, we demonstrate how the proposed techniques can effectively detect inconsistencies, thereby safeguarding the integrity of research fields. To benefit the scientific community, we have made the consistency tests available in an open-source Python package.
    A Scalable Test Problem Generator for Sequential Transfer Optimization. (arXiv:2304.08503v4 [cs.NE] UPDATED)
    Sequential transfer optimization (STO), which aims to improve the optimization performance on a task of interest by exploiting the knowledge captured from several previously-solved optimization tasks stored in a database, has been gaining increasing research attention over the years. However, despite the remarkable advances in algorithm design, the development of a systematic benchmark suite for comprehensive comparisons of STO algorithms received far less attention. Existing test problems are either simply generated by assembling other benchmark functions or extended from specific practical problems with limited scalability. The relationships between the optimal solutions of the source and target tasks in these problems are also often manually configured, limiting their ability to model different similarity relationships presented in real-world problems. Consequently, the good performance achieved by an algorithm on these problems might be biased and hard to be generalized to other problems. In light of the above, in this study, we first introduce four concepts for characterizing STO problems and present an important problem feature, namely similarity distribution, which quantitatively delineates the relationship between the optima of the source and target tasks. Then, we present the general design guidelines of STO problems and a particular STO problem generator with good scalability. Specifically, the similarity distribution of a problem can be easily customized, enabling a continuous spectrum of representation of the diverse similarity relationships of real-world problems. Lastly, a benchmark suite with 12 STO problems featured by a variety of customized similarity relationships is developed using the proposed generator. The source code of the problem generator is available at https://github.com/XmingHsueh/STOP-G.
    SemantIC: Semantic Interference Cancellation Towards 6G Wireless Communications. (arXiv:2310.12768v1 [eess.SP])
    This letter proposes a novel anti-interference technique, semantic interference cancellation (SemantIC), for enhancing information quality towards the sixth-generation (6G) wireless networks. SemantIC only requires the receiver to concatenate the channel decoder with a semantic auto-encoder. This constructs a turbo loop which iteratively and alternately eliminates noise in the signal domain and the semantic domain. From the viewpoint of network information theory, the neural network of the semantic auto-encoder stores side information by training, and provides side information in iterative decoding, as an implementation of the Wyner-Ziv theorem. Simulation results verify the performance improvement by SemantIC without extra channel resource cost.
    Automatic Hallucination Assessment for Aligned Large Language Models via Transferable Adversarial Attacks. (arXiv:2310.12516v1 [cs.CL])
    Although remarkable progress has been achieved in preventing large language model (LLM) hallucinations using instruction tuning and retrieval augmentation, it remains challenging to measure the reliability of LLMs using human-crafted evaluation data which is not available for many tasks and domains and could suffer from data leakage. Inspired by adversarial machine learning, this paper aims to develop a method of automatically generating evaluation data by appropriately modifying existing data on which LLMs behave faithfully. Specifically, this paper presents AutoDebug, an LLM-based framework to use prompting chaining to generate transferable adversarial attacks in the form of question-answering examples. We seek to understand the extent to which these examples trigger the hallucination behaviors of LLMs. We implement AutoDebug using ChatGPT and evaluate the resulting two variants of a popular open-domain question-answering dataset, Natural Questions (NQ), on a collection of open-source and proprietary LLMs under various prompting settings. Our generated evaluation data is human-readable and, as we show, humans can answer these modified questions well. Nevertheless, we observe pronounced accuracy drops across multiple LLMs including GPT-4. Our experimental results show that LLMs are likely to hallucinate in two categories of question-answering scenarios where (1) there are conflicts between knowledge given in the prompt and their parametric knowledge, or (2) the knowledge expressed in the prompt is complex. Finally, we find that the adversarial examples generated by our method are transferable across all considered LLMs. The examples generated by a small model can be used to debug a much larger model, making our approach cost-effective.
    Open-World Lifelong Graph Learning. (arXiv:2310.12565v1 [cs.LG])
    We study the problem of lifelong graph learning in an open-world scenario, where a model needs to deal with new tasks and potentially unknown classes. We utilize Out-of-Distribution (OOD) detection methods to recognize new classes and adapt existing non-graph OOD detection methods to graph data. Crucially, we suggest performing new class detection by combining OOD detection methods with information aggregated from the graph neighborhood. Most OOD detection methods avoid determining a crisp threshold for deciding whether a vertex is OOD. To tackle this problem, we propose a Weakly-supervised Relevance Feedback (Open-WRF) method, which decreases the sensitivity to thresholds in OOD detection. We evaluate our approach on six benchmark datasets. Our results show that the proposed neighborhood aggregation method for OOD scores outperforms existing methods independent of the underlying graph neural network. Furthermore, we demonstrate that our Open-WRF method is more robust to threshold selection and analyze the influence of graph neighborhood on OOD detection. The aggregation and threshold methods are compatible with arbitrary graph neural networks and OOD detection methods, making our approach versatile and applicable to many real-world applications.
    A Unifying Framework for Learning Argumentation Semantics. (arXiv:2310.12309v1 [cs.AI])
    Argumentation is a very active research field of Artificial Intelligence concerned with the representation and evaluation of arguments used in dialogues between humans and/or artificial agents. Acceptability semantics of formal argumentation systems define the criteria for the acceptance or rejection of arguments. Several software systems, known as argumentation solvers, have been developed to compute the accepted/rejected arguments using such criteria. These include systems that learn to identify the accepted arguments using non-interpretable methods. In this paper we present a novel framework, which uses an Inductive Logic Programming approach to learn the acceptability semantics for several abstract and structured argumentation frameworks in an interpretable way. Through an empirical evaluation we show that our framework outperforms existing argumentation solvers, thus opening up new future research directions in the area of formal argumentation and human-machine dialogues.
    Operator-Based Detecting, Learning, and Stabilizing Unstable Periodic Orbits of Chaotic Attractors. (arXiv:2310.12156v1 [nlin.AO])
    This paper examines the use of operator-theoretic approaches to the analysis of chaotic systems through the lens of their unstable periodic orbits (UPOs). Our approach involves three data-driven steps for detecting, identifying, and stabilizing UPOs. We demonstrate the use of kernel integral operators within delay coordinates as an innovative method for UPO detection. For identifying the dynamic behavior associated with each individual UPO, we utilize the Koopman operator to present the dynamics as linear equations in the space of Koopman eigenfunctions. This allows for characterizing the chaotic attractor by investigating its principal dynamical modes across varying UPOs. We extend this methodology into an interpretable machine learning framework aimed at stabilizing strange attractors on their UPOs. To illustrate the efficacy of our approach, we apply it to the Lorenz attractor as a case study.
    Category-Agnostic 6D Pose Estimation with Conditional Neural Processes. (arXiv:2206.07162v2 [cs.CV] UPDATED)
    We present a novel meta-learning approach for 6D pose estimation on unknown objects. In contrast to ``instance-level" and ``category-level" pose estimation methods, our algorithm learns object representation in a category-agnostic way, which endows it with strong generalization capabilities across object categories. Specifically, we employ a neural process-based meta-learning approach to train an encoder to capture texture and geometry of an object in a latent representation, based on very few RGB-D images and ground-truth keypoints. The latent representation is then used by a simultaneously meta-trained decoder to predict the 6D pose of the object in new images. Furthermore, we propose a novel geometry-aware decoder for the keypoint prediction using a Graph Neural Network (GNN), which explicitly takes geometric constraints specific to each object into consideration. To evaluate our algorithm, extensive experiments are conducted on the \linemod dataset, and on our new fully-annotated synthetic datasets generated from Multiple Categories in Multiple Scenes (MCMS). Experimental results demonstrate that our model performs well on unseen objects with very different shapes and appearances. Remarkably, our model also shows robust performance on occluded scenes although trained fully on data without occlusion. To our knowledge, this is the first work exploring \textbf{cross-category level} 6D pose estimation.
    Julearn: an easy-to-use library for leakage-free evaluation and inspection of ML models. (arXiv:2310.12568v1 [cs.LG])
    The fast-paced development of machine learning (ML) methods coupled with its increasing adoption in research poses challenges for researchers without extensive training in ML. In neuroscience, for example, ML can help understand brain-behavior relationships, diagnose diseases, and develop biomarkers using various data sources like magnetic resonance imaging and electroencephalography. The primary objective of ML is to build models that can make accurate predictions on unseen data. Researchers aim to prove the existence of such generalizable models by evaluating performance using techniques such as cross-validation (CV), which uses systematic subsampling to estimate the generalization performance. Choosing a CV scheme and evaluating an ML pipeline can be challenging and, if used improperly, can lead to overestimated results and incorrect interpretations. We created julearn, an open-source Python library, that allow researchers to design and evaluate complex ML pipelines without encountering in common pitfalls. In this manuscript, we present the rationale behind julearn's design, its core features, and showcase three examples of previously-published research projects that can be easily implemented using this novel library. Julearn aims to simplify the entry into the ML world by providing an easy-to-use environment with built in guards against some of the most common ML pitfalls. With its design, unique features and simple interface, it poses as a useful Python-based library for research projects.
    Differentiable Vertex Fitting for Jet Flavour Tagging. (arXiv:2310.12804v1 [hep-ex])
    We propose a differentiable vertex fitting algorithm that can be used for secondary vertex fitting, and that can be seamlessly integrated into neural networks for jet flavour tagging. Vertex fitting is formulated as an optimization problem where gradients of the optimized solution vertex are defined through implicit differentiation and can be passed to upstream or downstream neural network components for network training. More broadly, this is an application of differentiable programming to integrate physics knowledge into neural network models in high energy physics. We demonstrate how differentiable secondary vertex fitting can be integrated into larger transformer-based models for flavour tagging and improve heavy flavour jet classification.
    Learning threshold neurons via the "edge of stability". (arXiv:2212.07469v2 [cs.LG] UPDATED)
    Existing analyses of neural network training often operate under the unrealistic assumption of an extremely small learning rate. This lies in stark contrast to practical wisdom and empirical studies, such as the work of J. Cohen et al. (ICLR 2021), which exhibit startling new phenomena (the "edge of stability" or "unstable convergence") and potential benefits for generalization in the large learning rate regime. Despite a flurry of recent works on this topic, however, the latter effect is still poorly understood. In this paper, we take a step towards understanding genuinely non-convex training dynamics with large learning rates by performing a detailed analysis of gradient descent for simplified models of two-layer neural networks. For these models, we provably establish the edge of stability phenomenon and discover a sharp phase transition for the step size below which the neural network fails to learn "threshold-like" neurons (i.e., neurons with a non-zero first-layer bias). This elucidates one possible mechanism by which the edge of stability can in fact lead to better generalization, as threshold neurons are basic building blocks with useful inductive bias for many tasks.
    DA-TransUNet: Integrating Spatial and Channel Dual Attention with Transformer U-Net for Medical Image Segmentation. (arXiv:2310.12570v1 [eess.IV])
    Great progress has been made in automatic medical image segmentation due to powerful deep representation learning. The influence of transformer has led to research into its variants, and large-scale replacement of traditional CNN modules. However, such trend often overlooks the intrinsic feature extraction capabilities of the transformer and potential refinements to both the model and the transformer module through minor adjustments. This study proposes a novel deep medical image segmentation framework, called DA-TransUNet, aiming to introduce the Transformer and dual attention block into the encoder and decoder of the traditional U-shaped architecture. Unlike prior transformer-based solutions, our DA-TransUNet utilizes attention mechanism of transformer and multifaceted feature extraction of DA-Block, which can efficiently combine global, local, and multi-scale features to enhance medical image segmentation. Meanwhile, experimental results show that a dual attention block is added before the Transformer layer to facilitate feature extraction in the U-net structure. Furthermore, incorporating dual attention blocks in skip connections can enhance feature transfer to the decoder, thereby improving image segmentation performance. Experimental results across various benchmark of medical image segmentation reveal that DA-TransUNet significantly outperforms the state-of-the-art methods. The codes and parameters of our model will be publicly available at https://github.com/SUN-1024/DA-TransUnet.
    Canonical normalizing flows for manifold learning. (arXiv:2310.12743v1 [stat.ML])
    Manifold learning flows are a class of generative modelling techniques that assume a low-dimensional manifold description of the data. The embedding of such manifold into the high-dimensional space of the data is achieved via learnable invertible transformations. Therefore, once the manifold is properly aligned via a reconstruction loss, the probability density is tractable on the manifold and maximum likelihood can be used optimize the network parameters. Naturally, the lower-dimensional representation of the data requires an injective-mapping. Recent approaches were able to enforce that density aligns with the modelled manifold, while efficiently calculating the density volume-change term when embedding to the higher-dimensional space. However, unless the injective-mapping is analytically predefined, the learned manifold is not necessarily an efficient representation of the data. Namely, the latent dimensions of such models frequently learn an entangled intrinsic basis with degenerate information being stored in each dimension. Alternatively, if a locally orthogonal and/or sparse basis is to be learned, here coined canonical intrinsic basis, it can serve in learning a more compact latent space representation. Towards this end, we propose a canonical manifold learning flow method, where a novel optimization objective enforces the transformation matrix to have few prominent and orthogonal basis functions. Canonical manifold flow yields a more efficient use of the latent space, automatically generating fewer prominent and distinct dimensions to represent data, and consequently a better approximation of target distributions than other manifold flow methods in most experiments we conducted, resulting in lower FID scores.
    Transformer-based Entity Legal Form Classification. (arXiv:2310.12766v1 [cs.CL])
    We propose the application of Transformer-based language models for classifying entity legal forms from raw legal entity names. Specifically, we employ various BERT variants and compare their performance against multiple traditional baselines. Our evaluation encompasses a substantial subset of freely available Legal Entity Identifier (LEI) data, comprising over 1.1 million legal entities from 30 different legal jurisdictions. The ground truth labels for classification per jurisdiction are taken from the Entity Legal Form (ELF) code standard (ISO 20275). Our findings demonstrate that pre-trained BERT variants outperform traditional text classification approaches in terms of F1 score, while also performing comparably well in the Macro F1 Score. Moreover, the validity of our proposal is supported by the outcome of third-party expert reviews conducted in ten selected jurisdictions. This study highlights the significant potential of Transformer-based models in advancing data standardization and data integration. The presented approaches can greatly benefit financial institutions, corporations, governments and other organizations in assessing business relationships, understanding risk exposure, and promoting effective governance.
    Towards Enhanced Local Explainability of Random Forests: a Proximity-Based Approach. (arXiv:2310.12428v1 [stat.ML])
    We initiate a novel approach to explain the out of sample performance of random forest (RF) models by exploiting the fact that any RF can be formulated as an adaptive weighted K nearest-neighbors model. Specifically, we use the proximity between points in the feature space learned by the RF to re-write random forest predictions exactly as a weighted average of the target labels of training data points. This linearity facilitates a local notion of explainability of RF predictions that generates attributions for any model prediction across observations in the training set, and thereby complements established methods like SHAP, which instead generates attributions for a model prediction across dimensions of the feature space. We demonstrate this approach in the context of a bond pricing model trained on US corporate bond trades, and compare our approach to various existing approaches to model explainability.
    AI Potentiality and Awareness: A Position Paper from the Perspective of Human-AI Teaming in Cybersecurity. (arXiv:2310.12162v1 [cs.CR])
    This position paper explores the broad landscape of AI potentiality in the context of cybersecurity, with a particular emphasis on its possible risk factors with awareness, which can be managed by incorporating human experts in the loop, i.e., "Human-AI" teaming. As artificial intelligence (AI) technologies advance, they will provide unparalleled opportunities for attack identification, incident response, and recovery. However, the successful deployment of AI into cybersecurity measures necessitates an in-depth understanding of its capabilities, challenges, and ethical and legal implications to handle associated risk factors in real-world application areas. Towards this, we emphasize the importance of a balanced approach that incorporates AI's computational power with human expertise. AI systems may proactively discover vulnerabilities and detect anomalies through pattern recognition, and predictive modeling, significantly enhancing speed and accuracy. Human experts can explain AI-generated decisions to stakeholders, regulators, and end-users in critical situations, ensuring responsibility and accountability, which helps establish trust in AI-driven security solutions. Therefore, in this position paper, we argue that human-AI teaming is worthwhile in cybersecurity, in which human expertise such as intuition, critical thinking, or contextual understanding is combined with AI's computational power to improve overall cyber defenses.
    Preliminary studies: Comparing LSTM and BLSTM Deep Neural Networks for Power Consumption Prediction. (arXiv:2305.16546v2 [cs.LG] UPDATED)
    Electric consumption prediction methods are investigated for many reasons such as decision-making related to energy efficiency as well as for anticipating demand in the energy market dynamics. The objective of the present work is the comparison between two Deep Learning models, namely the Long Short-Term Memory (LSTM) and Bi-directional LSTM (BLSTM) for univariate electric consumption Time Series (TS) short-term forecast. The Data Sets (DSs) were selected for their different contexts and scales, aiming the assessment of the models' robustness. Four DSs were used, related to the power consumption of: (a) a household in France; (b) a university building in Santar\'em, Brazil; (c) the T\'etouan city zones, in Morocco; and (c) the Singapore aggregated electric demand. The metrics RMSE, MAE, MAPE and R2 were calculated in a TS cross-validation scheme. The Friedman's test was applied to normalized RMSE (NRMSE) results, showing that BLSTM outperforms LSTM with statistically significant difference (p = 0.0455), corroborating the fact that bidirectional weight updating improves significantly the LSTM performance concerning different scales of electric power consumption.
    Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing. (arXiv:2310.12404v1 [cs.SD])
    Creating music is iterative, requiring varied methods at each stage. However, existing AI music systems fall short in orchestrating multiple subsystems for diverse needs. To address this gap, we introduce Loop Copilot, a novel system that enables users to generate and iteratively refine music through an interactive, multi-round dialogue interface. The system uses a large language model to interpret user intentions and select appropriate AI models for task execution. Each backend model is specialized for a specific task, and their outputs are aggregated to meet the user's requirements. To ensure musical coherence, essential attributes are maintained in a centralized table. We evaluate the effectiveness of the proposed system through semi-structured interviews and questionnaires, highlighting its utility not only in facilitating music creation but also its potential for broader applications.
    Knowledge from Uncertainty in Evidential Deep Learning. (arXiv:2310.12663v1 [cs.LG])
    This work reveals an evidential signal that emerges from the uncertainty value in Evidential Deep Learning (EDL). EDL is one example of a class of uncertainty-aware deep learning approaches designed to provide confidence (or epistemic uncertainty) about the current test sample. In particular for computer vision and bidirectional encoder large language models, the `evidential signal' arising from the Dirichlet strength in EDL can, in some cases, discriminate between classes, which is particularly strong when using large language models. We hypothesise that the KL regularisation term causes EDL to couple aleatoric and epistemic uncertainty. In this paper, we empirically investigate the correlations between misclassification and evaluated uncertainty, and show that EDL's `evidential signal' is due to misclassification bias. We critically evaluate EDL with other Dirichlet-based approaches, namely Generative Evidential Neural Networks (EDL-GEN) and Prior Networks, and show theoretically and empirically the differences between these loss functions. We conclude that EDL's coupling of uncertainty arises from these differences due to the use (or lack) of out-of-distribution samples during training.
    Rethinking Complex Queries on Knowledge Graphs with Neural Link Predictors. (arXiv:2304.07063v3 [cs.AI] UPDATED)
    Reasoning on knowledge graphs is a challenging task because it utilizes observed information to predict the missing one. Particularly, answering complex queries based on first-order logic is one of the crucial tasks to verify learning to reason abilities for generalization and composition. Recently, the prevailing method is query embedding which learns the embedding of a set of entities and treats logic operations as set operations and has shown great empirical success. Though there has been much research following the same formulation, many of its claims lack a formal and systematic inspection. In this paper, we rethink this formulation and justify many of the previous claims by characterizing the scope of queries investigated previously and precisely identifying the gap between its formulation and its goal, as well as providing complexity analysis for the currently investigated queries. Moreover, we develop a new dataset containing ten new types of queries with features that have never been considered and therefore can provide a thorough investigation of complex queries. Finally, we propose a new neural-symbolic method, Fuzzy Inference with Truth value (FIT), where we equip the neural link predictors with fuzzy logic theory to support end-to-end learning using complex queries with provable reasoning capability. Empirical results show that our method outperforms previous methods significantly in the new dataset and also surpasses previous methods in the existing dataset at the same time.
    2D-3D Interlaced Transformer for Point Cloud Segmentation with Scene-Level Supervision. (arXiv:2310.12817v1 [cs.CV])
    We present a Multimodal Interlaced Transformer (MIT) that jointly considers 2D and 3D data for weakly supervised point cloud segmentation. Research studies have shown that 2D and 3D features are complementary for point cloud segmentation. However, existing methods require extra 2D annotations to achieve 2D-3D information fusion. Considering the high annotation cost of point clouds, effective 2D and 3D feature fusion based on weakly supervised learning is in great demand. To this end, we propose a transformer model with two encoders and one decoder for weakly supervised point cloud segmentation using only scene-level class tags. Specifically, the two encoders compute the self-attended features for 3D point clouds and 2D multi-view images, respectively. The decoder implements interlaced 2D-3D cross-attention and carries out implicit 2D and 3D feature fusion. We alternately switch the roles of queries and key-value pairs in the decoder layers. It turns out that the 2D and 3D features are iteratively enriched by each other. Experiments show that it performs favorably against existing weakly supervised point cloud segmentation methods by a large margin on the S3DIS and ScanNet benchmarks. The project page will be available at https://jimmy15923.github.io/mit_web/.
    Model Merging by Uncertainty-Based Gradient Matching. (arXiv:2310.12808v1 [cs.LG])
    Models trained on different datasets can be merged by a weighted-averaging of their parameters, but why does it work and when can it fail? Here, we connect the inaccuracy of weighted-averaging to mismatches in the gradients and propose a new uncertainty-based scheme to improve the performance by reducing the mismatch. The connection also reveals implicit assumptions in other schemes such as averaging, task arithmetic, and Fisher-weighted averaging. Our new method gives consistent improvements for large language models and vision transformers, both in terms of performance and robustness to hyperparameters.
    Knowledge-Augmented Language Model Verification. (arXiv:2310.12836v1 [cs.CL])
    Recent Language Models (LMs) have shown impressive capabilities in generating texts with the knowledge internalized in parameters. Yet, LMs often generate the factually incorrect responses to the given queries, since their knowledge may be inaccurate, incomplete, and outdated. To address this problem, previous works propose to augment LMs with the knowledge retrieved from an external knowledge source. However, such approaches often show suboptimal text generation performance due to two reasons: 1) the model may fail to retrieve the knowledge relevant to the given query, or 2) the model may not faithfully reflect the retrieved knowledge in the generated text. To overcome these, we propose to verify the output and the knowledge of the knowledge-augmented LMs with a separate verifier, which is a small LM that is trained to detect those two types of errors through instruction-finetuning. Then, when the verifier recognizes an error, we can rectify it by either retrieving new knowledge or generating new text. Further, we use an ensemble of the outputs from different instructions with a single verifier to enhance the reliability of the verification processes. We validate the effectiveness of the proposed verification steps on multiple question answering benchmarks, whose results show that the proposed verifier effectively identifies retrieval and generation errors, allowing LMs to provide more factually correct outputs. Our code is available at https://github.com/JinheonBaek/KALMV.
    AgentTuning: Enabling Generalized Agent Abilities for LLMs. (arXiv:2310.12823v1 [cs.CL])
    Open large language models (LLMs) with great performance in various tasks have significantly advanced the development of LLMs. However, they are far inferior to commercial models such as ChatGPT and GPT-4 when acting as agents to tackle complex tasks in the real world. These agent tasks employ LLMs as the central controller responsible for planning, memorization, and tool utilization, necessitating both fine-grained prompting methods and robust LLMs to achieve satisfactory performance. Though many prompting methods have been proposed to complete particular agent tasks, there is lack of research focusing on improving the agent capabilities of LLMs themselves without compromising their general abilities. In this work, we present AgentTuning, a simple and general method to enhance the agent abilities of LLMs while maintaining their general LLM capabilities. We construct AgentInstruct, a lightweight instruction-tuning dataset containing high-quality interaction trajectories. We employ a hybrid instruction-tuning strategy by combining AgentInstruct with open-source instructions from general domains. AgentTuning is used to instruction-tune the Llama 2 series, resulting in AgentLM. Our evaluations show that AgentTuning enables LLMs' agent capabilities without compromising general abilities. The AgentLM-70B is comparable to GPT-3.5-turbo on unseen agent tasks, demonstrating generalized agent capabilities. We open source the AgentInstruct and AgentLM-7B, 13B, and 70B models at https://github.com/THUDM/AgentTuning , serving open and powerful alternatives to commercial LLMs for agent tasks.
    Learn from the Past: A Proxy based Adversarial Defense Framework to Boost Robustness. (arXiv:2310.12713v1 [cs.LG])
    In light of the vulnerability of deep learning models to adversarial samples and the ensuing security issues, a range of methods, including Adversarial Training (AT) as a prominent representative, aimed at enhancing model robustness against various adversarial attacks, have seen rapid development. However, existing methods essentially assist the current state of target model to defend against parameter-oriented adversarial attacks with explicit or implicit computation burdens, which also suffers from unstable convergence behavior due to inconsistency of optimization trajectories. Diverging from previous work, this paper reconsiders the update rule of target model and corresponding deficiency to defend based on its current state. By introducing the historical state of the target model as a proxy, which is endowed with much prior information for defense, we formulate a two-stage update rule, resulting in a general adversarial defense framework, which we refer to as `LAST' ({\bf L}earn from the P{\bf ast}). Besides, we devise a Self Distillation (SD) based defense objective to constrain the update process of the proxy model without the introduction of larger teacher models. Experimentally, we demonstrate consistent and significant performance enhancements by refining a series of single-step and multi-step AT methods (e.g., up to $\bf 9.2\%$ and $\bf 20.5\%$ improvement of Robust Accuracy (RA) on CIFAR10 and CIFAR100 datasets, respectively) across various datasets, backbones and attack modalities, and validate its ability to enhance training stability and ameliorate catastrophic overfitting issues meanwhile.
    OceanGPT: A Large Language Model for Ocean Science Tasks. (arXiv:2310.02031v3 [cs.CL] UPDATED)
    Ocean science, which delves into the oceans that are reservoirs of life and biodiversity, is of great significance given that oceans cover over 70% of our planet's surface. Recently, advances in Large Language Models (LLMs) have transformed the paradigm in science. Despite the success in other domains, current LLMs often fall short in catering to the needs of domain experts like oceanographers, and the potential of LLMs for ocean science is under-explored. The intrinsic reason may be the immense and intricate nature of ocean data as well as the necessity for higher granularity and richness in knowledge. To alleviate these issues, we introduce OceanGPT, the first-ever LLM in the ocean domain, which is expert in various ocean science tasks. We propose DoInstruct, a novel framework to automatically obtain a large volume of ocean domain instruction data, which generates instructions based on multi-agent collaboration. Additionally, we construct the first oceanography benchmark, OceanBench, to evaluate the capabilities of LLMs in the ocean domain. Though comprehensive experiments, OceanGPT not only shows a higher level of knowledge expertise for oceans science tasks but also gains preliminary embodied intelligence capabilities in ocean technology. Codes, data and checkpoints will soon be available at https://github.com/zjunlp/KnowLM.
    Causal Similarity-Based Hierarchical Bayesian Models. (arXiv:2310.12595v1 [cs.LG])
    The key challenge underlying machine learning is generalisation to new data. This work studies generalisation for datasets consisting of related tasks that may differ in causal mechanisms. For example, observational medical data for complex diseases suffers from heterogeneity in causal mechanisms of disease across patients, creating challenges for machine learning algorithms that need to generalise to new patients outside of the training dataset. Common approaches for learning supervised models with heterogeneous datasets include learning a global model for the entire dataset, learning local models for each tasks' data, or utilising hierarchical, meta-learning and multi-task learning approaches to learn how to generalise from data pooled across multiple tasks. In this paper we propose causal similarity-based hierarchical Bayesian models to improve generalisation to new tasks by learning how to pool data from training tasks with similar causal mechanisms. We apply this general modelling principle to Bayesian neural networks and compare a variety of methods for estimating causal task similarity (for both known and unknown causal models). We demonstrate the benefits of our approach and applicability to real world problems through a range of experiments on simulated and real data.
    Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights. (arXiv:2310.12462v1 [cs.LG])
    In the realm of deep learning, transformers have emerged as a dominant architecture, particularly in natural language processing tasks. However, with their widespread adoption, concerns regarding the security and privacy of the data processed by these models have arisen. In this paper, we address a pivotal question: Can the data fed into transformers be recovered using their attention weights and outputs? We introduce a theoretical framework to tackle this problem. Specifically, we present an algorithm that aims to recover the input data $X \in \mathbb{R}^{d \times n}$ from given attention weights $W = QK^\top \in \mathbb{R}^{d \times d}$ and output $B \in \mathbb{R}^{n \times n}$ by minimizing the loss function $L(X)$. This loss function captures the discrepancy between the expected output and the actual output of the transformer. Our findings have significant implications for the Localized Layer-wise Mechanism (LLM), suggesting potential vulnerabilities in the model's design from a security and privacy perspective. This work underscores the importance of understanding and safeguarding the internal workings of transformers to ensure the confidentiality of processed data.
    Conditional Density Estimations from Privacy-Protected Data. (arXiv:2310.12781v1 [stat.ML])
    Many modern statistical analysis and machine learning applications require training models on sensitive user data. Differential privacy provides a formal guarantee that individual-level information about users does not leak. In this framework, randomized algorithms inject calibrated noise into the confidential data, resulting in privacy-protected datasets or queries. However, restricting access to only the privatized data during statistical analysis makes it computationally challenging to perform valid inferences on parameters underlying the confidential data. In this work, we propose simulation-based inference methods from privacy-protected datasets. Specifically, we use neural conditional density estimators as a flexible family of distributions to approximate the posterior distribution of model parameters given the observed private query results. We illustrate our methods on discrete time-series data under an infectious disease model and on ordinary linear regression models. Illustrating the privacy-utility trade-off, our experiments and analysis demonstrate the necessity and feasibility of designing valid statistical inference procedures to correct for biases introduced by the privacy-protection mechanisms.
    Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared Pre-trained Language Models. (arXiv:2310.12818v1 [cs.CL])
    Parameter-shared pre-trained language models (PLMs) have emerged as a successful approach in resource-constrained environments, enabling substantial reductions in model storage and memory costs without significant performance compromise. However, it is important to note that parameter sharing does not alleviate computational burdens associated with inference, thus impeding its practicality in situations characterized by limited stringent latency requirements or computational resources. Building upon neural ordinary differential equations (ODEs), we introduce a straightforward technique to enhance the inference efficiency of parameter-shared PLMs. Additionally, we propose a simple pre-training technique that leads to fully or partially shared models capable of achieving even greater inference acceleration. The experimental results demonstrate the effectiveness of our methods on both autoregressive and autoencoding PLMs, providing novel insights into more efficient utilization of parameter-shared models in resource-constrained settings.
    Fast Model Debias with Machine Unlearning. (arXiv:2310.12560v1 [cs.LG])
    Recent discoveries have revealed that deep neural networks might behave in a biased manner in many real-world scenarios. For instance, deep networks trained on a large-scale face recognition dataset CelebA tend to predict blonde hair for females and black hair for males. Such biases not only jeopardize the robustness of models but also perpetuate and amplify social biases, which is especially concerning for automated decision-making processes in healthcare, recruitment, etc., as they could exacerbate unfair economic and social inequalities among different groups. Existing debiasing methods suffer from high costs in bias labeling or model re-training, while also exhibiting a deficiency in terms of elucidating the origins of biases within the model. To this respect, we propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases inherent in trained models. The FMD identifies biased attributes through an explicit counterfactual concept and quantifies the influence of data samples with influence functions. Moreover, we design a machine unlearning-based strategy to efficiently and effectively remove the bias in a trained model with a small counterfactual dataset. Experiments on the Colored MNIST, CelebA, and Adult Income datasets along with experiments with large language models demonstrate that our method achieves superior or competing accuracies compared with state-of-the-art methods while attaining significantly fewer biases and requiring much less debiasing cost. Notably, our method requires only a small external dataset and updating a minimal amount of model parameters, without the requirement of access to training data that may be too large or unavailable in practice.
    An Improved Metarounding Algorithm via Frank-Wolfe. (arXiv:2310.12629v1 [cs.DS])
    Metarounding is an approach to convert an approximation algorithm for linear optimization over some combinatorial classes to an online linear optimization algorithm for the same class. We propose a new metarounding algorithm under a natural assumption that a relax-based approximation algorithm exists for the combinatorial class. Our algorithm is much more efficient in both theoretical and practical aspects.
    Label-Aware Automatic Verbalizer for Few-Shot Text Classification. (arXiv:2310.12778v1 [cs.CL])
    Prompt-based learning has shown its effectiveness in few-shot text classification. One important factor in its success is a verbalizer, which translates output from a language model into a predicted class. Notably, the simplest and widely acknowledged verbalizer employs manual labels to represent the classes. However, manual selection does not guarantee the optimality of the selected words when conditioned on the chosen language model. Therefore, we propose Label-Aware Automatic Verbalizer (LAAV), effectively augmenting the manual labels to achieve better few-shot classification results. Specifically, we use the manual labels along with the conjunction "and" to induce the model to generate more effective words for the verbalizer. The experimental results on five datasets across five languages demonstrate that LAAV significantly outperforms existing verbalizers. Furthermore, our analysis reveals that LAAV suggests more relevant words compared to similar approaches, especially in mid-to-low resource languages.
    MTS-LOF: Medical Time-Series Representation Learning via Occlusion-Invariant Features. (arXiv:2310.12451v1 [cs.LG])
    Medical time series data are indispensable in healthcare, providing critical insights for disease diagnosis, treatment planning, and patient management. The exponential growth in data complexity, driven by advanced sensor technologies, has presented challenges related to data labeling. Self-supervised learning (SSL) has emerged as a transformative approach to address these challenges, eliminating the need for extensive human annotation. In this study, we introduce a novel framework for Medical Time Series Representation Learning, known as MTS-LOF. MTS-LOF leverages the strengths of contrastive learning and Masked Autoencoder (MAE) methods, offering a unique approach to representation learning for medical time series data. By combining these techniques, MTS-LOF enhances the potential of healthcare applications by providing more sophisticated, context-rich representations. Additionally, MTS-LOF employs a multi-masking strategy to facilitate occlusion-invariant feature learning. This approach allows the model to create multiple views of the data by masking portions of it. By minimizing the discrepancy between the representations of these masked patches and the fully visible patches, MTS-LOF learns to capture rich contextual information within medical time series datasets. The results of experiments conducted on diverse medical time series datasets demonstrate the superiority of MTS-LOF over other methods. These findings hold promise for significantly enhancing healthcare applications by improving representation learning. Furthermore, our work delves into the integration of joint-embedding SSL and MAE techniques, shedding light on the intricate interplay between temporal and structural dependencies in healthcare data. This understanding is crucial, as it allows us to grasp the complexities of healthcare data analysis.
    Improved Operator Learning by Orthogonal Attention. (arXiv:2310.12487v1 [cs.LG])
    Neural operators, as an efficient surrogate model for learning the solutions of PDEs, have received extensive attention in the field of scientific machine learning. Among them, attention-based neural operators have become one of the mainstreams in related research. However, existing approaches overfit the limited training data due to the considerable number of parameters in the attention mechanism. To address this, we develop an orthogonal attention based on the eigendecomposition of the kernel integral operator and the neural approximation of eigenfunctions. The orthogonalization naturally poses a proper regularization effect on the resulting neural operator, which aids in resisting overfitting and boosting generalization. Experiments on six standard neural operator benchmark datasets comprising both regular and irregular geometries show that our method can outperform competing baselines with decent margins.
    Attack Prompt Generation for Red Teaming and Defending Large Language Models. (arXiv:2310.12505v1 [cs.CL])
    Large language models (LLMs) are susceptible to red teaming attacks, which can induce LLMs to generate harmful content. Previous research constructs attack prompts via manual or automatic methods, which have their own limitations on construction cost and quality. To address these issues, we propose an integrated approach that combines manual and automatic methods to economically generate high-quality attack prompts. Specifically, considering the impressive capabilities of newly emerged LLMs, we propose an attack framework to instruct LLMs to mimic human-generated prompts through in-context learning. Furthermore, we propose a defense framework that fine-tunes victim LLMs through iterative interactions with the attack framework to enhance their safety against red teaming attacks. Extensive experiments on different LLMs validate the effectiveness of our proposed attack and defense frameworks. Additionally, we release a series of attack prompts datasets named SAP with varying sizes, facilitating the safety evaluation and enhancement of more LLMs. Our code and dataset is available on https://github.com/Aatrox103/SAP .
    Piecewise Deterministic Markov Processes for Bayesian Neural Networks. (arXiv:2302.08724v2 [stat.ML] UPDATED)
    Inference on modern Bayesian Neural Networks (BNNs) often relies on a variational inference treatment, imposing violated assumptions of independence and the form of the posterior. Traditional MCMC approaches avoid these assumptions at the cost of increased computation due to its incompatibility to subsampling of the likelihood. New Piecewise Deterministic Markov Process (PDMP) samplers permit subsampling, though introduce a model specific inhomogenous Poisson Process (IPPs) which is difficult to sample from. This work introduces a new generic and adaptive thinning scheme for sampling from these IPPs, and demonstrates how this approach can accelerate the application of PDMPs for inference in BNNs. Experimentation illustrates how inference with these methods is computationally feasible, can improve predictive accuracy, MCMC mixing performance, and provide informative uncertainty measurements when compared against other approximate inference schemes.
    Seeking the Truth Beyond the Data. An Unsupervised Machine Learning Approach. (arXiv:2207.06949v4 [stat.ML] UPDATED)
    Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grouped together aiming to the construction of well-established clusters that their elements are classified according to their similarity. The goal of this process is to provide a useful aid to the researcher that will help her/him to identify patterns among the data. Dealing with large databases, such patterns may not be easily detectable without the contribution of a clustering algorithm. This article provides a deep description of the most widely used clustering methodologies accompanied by useful presentations concerning suitable parameter selection and initializations. Simultaneously, this article not only represents a review highlighting the major elements of examined clustering techniques but emphasizes the comparison of these algorithms' clustering efficiency based on 3 datasets, revealing their existing weaknesses and capabilities through accuracy and complexity, during the confrontation of discrete and continuous observations. The produced results help us extract valuable conclusions about the appropriateness of the examined clustering techniques in accordance with the dataset's size.
    OODRobustBench: benchmarking and analyzing adversarial robustness under distribution shift. (arXiv:2310.12793v1 [cs.LG])
    Existing works have made great progress in improving adversarial robustness, but typically test their method only on data from the same distribution as the training data, i.e. in-distribution (ID) testing. As a result, it is unclear how such robustness generalizes under input distribution shifts, i.e. out-of-distribution (OOD) testing. This is a concerning omission as such distribution shifts are unavoidable when methods are deployed in the wild. To address this issue we propose a benchmark named OODRobustBench to comprehensively assess OOD adversarial robustness using 23 dataset-wise shifts (i.e. naturalistic shifts in input distribution) and 6 threat-wise shifts (i.e., unforeseen adversarial threat models). OODRobustBench is used to assess 706 robust models using 60.7K adversarial evaluations. This large-scale analysis shows that: 1) adversarial robustness suffers from a severe OOD generalization issue; 2) ID robustness correlates strongly with OOD robustness, in a positive linear way, under many distribution shifts. The latter enables the prediction of OOD robustness from ID robustness. Based on this, we are able to predict the upper limit of OOD robustness for existing robust training schemes. The results suggest that achieving OOD robustness requires designing novel methods beyond the conventional ones. Last, we discover that extra data, data augmentation, advanced model architectures and particular regularization approaches can improve OOD robustness. Noticeably, the discovered training schemes, compared to the baseline, exhibit dramatically higher robustness under threat shift while keeping high ID robustness, demonstrating new promising solutions for robustness against both multi-attack and unforeseen attacks.
    Energy-Based Models For Speech Synthesis. (arXiv:2310.12765v1 [cs.SD])
    Recently there has been a lot of interest in non-autoregressive (non-AR) models for speech synthesis, such as FastSpeech 2 and diffusion models. Unlike AR models, these models do not have autoregressive dependencies among outputs which makes inference efficient. This paper expands the range of available non-AR models with another member called energy-based models (EBMs). The paper describes how noise contrastive estimation, which relies on the comparison between positive and negative samples, can be used to train EBMs. It proposes a number of strategies for generating effective negative samples, including using high-performing AR models. It also describes how sampling from EBMs can be performed using Langevin Markov Chain Monte-Carlo (MCMC). The use of Langevin MCMC enables to draw connections between EBMs and currently popular diffusion models. Experiments on LJSpeech dataset show that the proposed approach offers improvements over Tacotron 2.
    Constructing Impactful Machine Learning Research for Astronomy: Best Practices for Researchers and Reviewers. (arXiv:2310.12528v1 [astro-ph.IM])
    Machine learning has rapidly become a tool of choice for the astronomical community. It is being applied across a wide range of wavelengths and problems, from the classification of transients to neural network emulators of cosmological simulations, and is shifting paradigms about how we generate and report scientific results. At the same time, this class of method comes with its own set of best practices, challenges, and drawbacks, which, at present, are often reported on incompletely in the astrophysical literature. With this paper, we aim to provide a primer to the astronomical community, including authors, reviewers, and editors, on how to implement machine learning models and report their results in a way that ensures the accuracy of the results, reproducibility of the findings, and usefulness of the method.
    Document-Level Language Models for Machine Translation. (arXiv:2310.12303v1 [cs.CL])
    Despite the known limitations, most machine translation systems today still operate on the sentence-level. One reason for this is, that most parallel training data is only sentence-level aligned, without document-level meta information available. In this work, we set out to build context-aware translation systems utilizing document-level monolingual data instead. This can be achieved by combining any existing sentence-level translation model with a document-level language model. We improve existing approaches by leveraging recent advancements in model combination. Additionally, we propose novel weighting techniques that make the system combination more flexible and significantly reduce computational overhead. In a comprehensive evaluation on four diverse translation tasks, we show that our extensions improve document-targeted scores substantially and are also computationally more efficient. However, we also find that in most scenarios, back-translation gives even better results, at the cost of having to re-train the translation system. Finally, we explore language model fusion in the light of recent advancements in large language models. Our findings suggest that there might be strong potential in utilizing large language models via model combination.  ( 2 min )
    Improving SCGAN's Similarity Constraint and Learning a Better Disentangled Representation. (arXiv:2310.12262v1 [cs.CV])
    SCGAN adds a similarity constraint between generated images and conditions as a regularization term on generative adversarial networks. Similarity constraint works as a tutor to instruct the generator network to comprehend the difference of representations based on conditions. We understand how SCGAN works on a deeper level. This understanding makes us realize that the similarity constraint functions like the contrastive loss function. We believe that a model with high understanding and intelligence measures the similarity between images based on their structure and high level features, just like humans do. Two major changes we applied to SCGAN in order to make a modified model are using SSIM to measure similarity between images and applying contrastive loss principles to the similarity constraint. The modified model performs better using FID and FactorVAE metrics. The modified model also has better generalisability compared to other models. Keywords Generative Adversarial Nets, Unsupervised Learning, Disentangled Representation Learning, Contrastive Disentanglement, SSIM  ( 2 min )
    American Option Pricing using Self-Attention GRU and Shapley Value Interpretation. (arXiv:2310.12500v1 [q-fin.PR])
    Options, serving as a crucial financial instrument, are used by investors to manage and mitigate their investment risks within the securities market. Precisely predicting the present price of an option enables investors to make informed and efficient decisions. In this paper, we propose a machine learning method for forecasting the prices of SPY (ETF) option based on gated recurrent unit (GRU) and self-attention mechanism. We first partitioned the raw dataset into 15 subsets according to moneyness and days to maturity criteria. For each subset, we matched the corresponding U.S. government bond rates and Implied Volatility Indices. This segmentation allows for a more insightful exploration of the impacts of risk-free rates and underlying volatility on option pricing. Next, we built four different machine learning models, including multilayer perceptron (MLP), long short-term memory (LSTM), self-attention LSTM, and self-attention GRU in comparison to the traditional binomial model. The empirical result shows that self-attention GRU with historical data outperforms other models due to its ability to capture complex temporal dependencies and leverage the contextual information embedded in the historical data. Finally, in order to unveil the "black box" of artificial intelligence, we employed the SHapley Additive exPlanations (SHAP) method to interpret and analyze the prediction results of the self-attention GRU model with historical data. This provides insights into the significance and contributions of different input features on the pricing of American-style options.  ( 2 min )
    Opportunities for Adaptive Experiments to Enable Continuous Improvement that Trades-off Instructor and Researcher Incentives. (arXiv:2310.12324v1 [cs.HC])
    Randomized experimental comparisons of alternative pedagogical strategies could provide useful empirical evidence in instructors' decision-making. However, traditional experiments do not have a clear and simple pathway to using data rapidly to try to increase the chances that students in an experiment get the best conditions. Drawing inspiration from the use of machine learning and experimentation in product development at leading technology companies, we explore how adaptive experimentation might help in continuous course improvement. In adaptive experiments, as different arms/conditions are deployed to students, data is analyzed and used to change the experience for future students. This can be done using machine learning algorithms to identify which actions are more promising for improving student experience or outcomes. This algorithm can then dynamically deploy the most effective conditions to future students, resulting in better support for students' needs. We illustrate the approach with a case study providing a side-by-side comparison of traditional and adaptive experimentation of self-explanation prompts in online homework problems in a CS1 course. This provides a first step in exploring the future of how this methodology can be useful in bridging research and practice in doing continuous improvement.  ( 2 min )
    WeaveNet for Approximating Two-sided Matching Problems. (arXiv:2310.12515v1 [cs.LG])
    Matching, a task to optimally assign limited resources under constraints, is a fundamental technology for society. The task potentially has various objectives, conditions, and constraints; however, the efficient neural network architecture for matching is underexplored. This paper proposes a novel graph neural network (GNN), \textit{WeaveNet}, designed for bipartite graphs. Since a bipartite graph is generally dense, general GNN architectures lose node-wise information by over-smoothing when deeply stacked. Such a phenomenon is undesirable for solving matching problems. WeaveNet avoids it by preserving edge-wise information while passing messages densely to reach a better solution. To evaluate the model, we approximated one of the \textit{strongly NP-hard} problems, \textit{fair stable matching}. Despite its inherent difficulties and the network's general purpose design, our model reached a comparative performance with state-of-the-art algorithms specially designed for stable matching for small numbers of agents.  ( 2 min )
    Enhanced Graph Neural Networks with Ego-Centric Spectral Subgraph Embeddings Augmentation. (arXiv:2310.12169v1 [cs.SI])
    Graph Neural Networks (GNNs) have shown remarkable merit in performing various learning-based tasks in complex networks. The superior performance of GNNs often correlates with the availability and quality of node-level features in the input networks. However, for many network applications, such node-level information may be missing or unreliable, thereby limiting the applicability and efficacy of GNNs. To address this limitation, we present a novel approach denoted as Ego-centric Spectral subGraph Embedding Augmentation (ESGEA), which aims to enhance and design node features, particularly in scenarios where information is lacking. Our method leverages the topological structure of the local subgraph to create topology-aware node features. The subgraph features are generated using an efficient spectral graph embedding technique, and they serve as node features that capture the local topological organization of the network. The explicit node features, if present, are then enhanced with the subgraph embeddings in order to improve the overall performance. ESGEA is compatible with any GNN-based architecture and is effective even in the absence of node features. We evaluate the proposed method in a social network graph classification task where node attributes are unavailable, as well as in a node classification task where node features are corrupted or even absent. The evaluation results on seven datasets and eight baseline models indicate up to a 10% improvement in AUC and a 7% improvement in accuracy for graph and node classification tasks, respectively.  ( 3 min )
    Equipping Federated Graph Neural Networks with Structure-aware Group Fairness. (arXiv:2310.12350v1 [cs.LG])
    Graph Neural Networks (GNNs) have been widely used for various types of graph data processing and analytical tasks in different domains. Training GNNs over centralized graph data can be infeasible due to privacy concerns and regulatory restrictions. Thus, federated learning (FL) becomes a trending solution to address this challenge in a distributed learning paradigm. However, as GNNs may inherit historical bias from training data and lead to discriminatory predictions, the bias of local models can be easily propagated to the global model in distributed settings. This poses a new challenge in mitigating bias in federated GNNs. To address this challenge, we propose $\text{F}^2$GNN, a Fair Federated Graph Neural Network, that enhances group fairness of federated GNNs. As bias can be sourced from both data and learning algorithms, $\text{F}^2$GNN aims to mitigate both types of bias under federated settings. First, we provide theoretical insights on the connection between data bias in a training graph and statistical fairness metrics of the trained GNN models. Based on the theoretical analysis, we design $\text{F}^2$GNN which contains two key components: a fairness-aware local model update scheme that enhances group fairness of the local models on the client side, and a fairness-weighted global model update scheme that takes both data bias and fairness metrics of local models into consideration in the aggregation process. We evaluate $\text{F}^2$GNN empirically versus a number of baseline methods, and demonstrate that $\text{F}^2$GNN outperforms these baselines in terms of both fairness and model accuracy.  ( 3 min )
    Few-Shot In-Context Imitation Learning via Implicit Graph Alignment. (arXiv:2310.12238v1 [cs.RO])
    Consider the following problem: given a few demonstrations of a task across a few different objects, how can a robot learn to perform that same task on new, previously unseen objects? This is challenging because the large variety of objects within a class makes it difficult to infer the task-relevant relationship between the new objects and the objects in the demonstrations. We address this by formulating imitation learning as a conditional alignment problem between graph representations of objects. Consequently, we show that this conditioning allows for in-context learning, where a robot can perform a task on a set of new objects immediately after the demonstrations, without any prior knowledge about the object class or any further training. In our experiments, we explore and validate our design choices, and we show that our method is highly effective for few-shot learning of several real-world, everyday tasks, whilst outperforming baselines. Videos are available on our project webpage at https://www.robot-learning.uk/implicit-graph-alignment.  ( 2 min )
    ClusT3: Information Invariant Test-Time Training. (arXiv:2310.12345v1 [cs.CV])
    Deep Learning models have shown remarkable performance in a broad range of vision tasks. However, they are often vulnerable against domain shifts at test-time. Test-time training (TTT) methods have been developed in an attempt to mitigate these vulnerabilities, where a secondary task is solved at training time simultaneously with the main task, to be later used as an self-supervised proxy task at test-time. In this work, we propose a novel unsupervised TTT technique based on the maximization of Mutual Information between multi-scale feature maps and a discrete latent representation, which can be integrated to the standard training as an auxiliary clustering task. Experimental results demonstrate competitive classification performance on different popular test-time adaptation benchmarks.  ( 2 min )
    Learning to Solve Climate Sensor Placement Problems with a Transformer. (arXiv:2310.12387v1 [cs.LG])
    The optimal placement of sensors for environmental monitoring and disaster management is a challenging problem due to its NP-hard nature. Traditional methods for sensor placement involve exact, approximation, or heuristic approaches, with the latter being the most widely used. However, heuristic methods are limited by expert intuition and experience. Deep learning (DL) has emerged as a promising approach for generating heuristic algorithms automatically. In this paper, we introduce a novel sensor placement approach focused on learning improvement heuristics using deep reinforcement learning (RL) methods. Our approach leverages an RL formulation for learning improvement heuristics, driven by an actor-critic algorithm for training the policy network. We compare our method with several state-of-the-art approaches by conducting comprehensive experiments, demonstrating the effectiveness and superiority of our proposed approach in producing high-quality solutions. Our work presents a promising direction for applying advanced DL and RL techniques to challenging climate sensor placement problems.  ( 2 min )
    Enhancing the Performance of Automated Grade Prediction in MOOC using Graph Representation Learning. (arXiv:2310.12281v1 [cs.LG])
    In recent years, Massive Open Online Courses (MOOCs) have gained significant traction as a rapidly growing phenomenon in online learning. Unlike traditional classrooms, MOOCs offer a unique opportunity to cater to a diverse audience from different backgrounds and geographical locations. Renowned universities and MOOC-specific providers, such as Coursera, offer MOOC courses on various subjects. Automated assessment tasks like grade and early dropout predictions are necessary due to the high enrollment and limited direct interaction between teachers and learners. However, current automated assessment approaches overlook the structural links between different entities involved in the downstream tasks, such as the students and courses. Our hypothesis suggests that these structural relationships, manifested through an interaction graph, contain valuable information that can enhance the performance of the task at hand. To validate this, we construct a unique knowledge graph for a large MOOC dataset, which will be publicly available to the research community. Furthermore, we utilize graph embedding techniques to extract latent structural information encoded in the interactions between entities in the dataset. These techniques do not require ground truth labels and can be utilized for various tasks. Finally, by combining entity-specific features, behavioral features, and extracted structural features, we enhance the performance of predictive machine learning models in student assignment grade prediction. Our experiments demonstrate that structural features can significantly improve the predictive performance of downstream assessment tasks. The code and data are available in \url{https://github.com/DSAatUSU/MOOPer_grade_prediction}  ( 3 min )
    MARVEL: Multi-Agent Reinforcement-Learning for Large-Scale Variable Speed Limits. (arXiv:2310.12359v1 [cs.MA])
    Variable speed limit (VSL) control is a promising traffic management strategy for enhancing safety and mobility. This work introduces MARVEL, a multi-agent reinforcement learning (MARL) framework for implementing large-scale VSL control on freeway corridors using only commonly available data. The agents learn through a reward structure that incorporates adaptability to traffic conditions, safety, and mobility; enabling coordination among the agents. The proposed framework scales to cover corridors with many gantries thanks to a parameter sharing among all VSL agents. The agents are trained in a microsimulation environment based on a short freeway stretch with 8 gantries spanning 7 miles and tested with 34 gantries spanning 17 miles of I-24 near Nashville, TN. MARVEL improves traffic safety by 63.4% compared to the no control scenario and enhances traffic mobility by 14.6% compared to a state-of-the-practice algorithm that has been deployed on I-24. An explainability analysis is undertaken to explore the learned policy under different traffic conditions and the results provide insights into the decision-making process of agents. Finally, we test the policy learned from the simulation-based experiments on real input data from I-24 to illustrate the potential deployment capability of the learned policy.  ( 2 min )
    Architectural Implications of GNN Aggregation Programming Abstractions. (arXiv:2310.12184v1 [cs.LG])
    Graph neural networks (GNNs) have gained significant popularity due to the powerful capability to extract useful representations from graph data. As the need for efficient GNN computation intensifies, a variety of programming abstractions designed for optimizing GNN Aggregation have emerged to facilitate acceleration. However, there is no comprehensive evaluation and analysis upon existing abstractions, thus no clear consensus on which approach is better. In this letter, we classify existing programming abstractions for GNN Aggregation by the dimension of data organization and propagation method. By constructing these abstractions on a state-of-the-art GNN library, we perform a thorough and detailed characterization study to compare their performance and efficiency, and provide several insights on future GNN acceleration based on our analysis.  ( 2 min )
    Balanced Group Convolution: An Improved Group Convolution Based on Approximability Estimates. (arXiv:2310.12461v1 [cs.LG])
    The performance of neural networks has been significantly improved by increasing the number of channels in convolutional layers. However, this increase in performance comes with a higher computational cost, resulting in numerous studies focused on reducing it. One promising approach to address this issue is group convolution, which effectively reduces the computational cost by grouping channels. However, to the best of our knowledge, there has been no theoretical analysis on how well the group convolution approximates the standard convolution. In this paper, we mathematically analyze the approximation of the group convolution to the standard convolution with respect to the number of groups. Furthermore, we propose a novel variant of the group convolution called balanced group convolution, which shows a higher approximation with a small additional computational cost. We provide experimental results that validate our theoretical findings and demonstrate the superior performance of the balanced group convolution over other variants of group convolution.  ( 2 min )
    RK-core: An Established Methodology for Exploring the Hierarchical Structure within Datasets. (arXiv:2310.12168v1 [cs.LG])
    Recently, the field of machine learning has undergone a transition from model-centric to data-centric. The advancements in diverse learning tasks have been propelled by the accumulation of more extensive datasets, subsequently facilitating the training of larger models on these datasets. However, these datasets remain relatively under-explored. To this end, we introduce a pioneering approach known as RK-core, to empower gaining a deeper understanding of the intricate hierarchical structure within datasets. Across several benchmark datasets, we find that samples with low coreness values appear less representative of their respective categories, and conversely, those with high coreness values exhibit greater representativeness. Correspondingly, samples with high coreness values make a more substantial contribution to the performance in comparison to those with low coreness values. Building upon this, we further employ RK-core to analyze the hierarchical structure of samples with different coreset selection methods. Remarkably, we find that a high-quality coreset should exhibit hierarchical diversity instead of solely opting for representative samples. The code is available at https://github.com/yaolu-zjut/Kcore.  ( 2 min )
    Open-Set Multivariate Time-Series Anomaly Detection. (arXiv:2310.12294v1 [cs.LG])
    Numerous methods for time series anomaly detection (TSAD) methods have emerged in recent years. Most existing methods are unsupervised and assume the availability of normal training samples only, while few supervised methods have shown superior performance by incorporating labeled anomalous samples in the training phase. However, certain anomaly types are inherently challenging for unsupervised methods to differentiate from normal data, while supervised methods are constrained to detecting anomalies resembling those present during training, failing to generalize to unseen anomaly classes. This paper is the first attempt in providing a novel approach for the open-set TSAD problem, in which a small number of labeled anomalies from a limited class of anomalies are visible in the training phase, with the objective of detecting both seen and unseen anomaly classes in the test phase. The proposed method, called Multivariate Open-Set timeseries Anomaly Detection (MOSAD) consists of three primary modules: a Feature Extractor to extract meaningful time-series features; a Multi-head Network consisting of Generative-, Deviation-, and Contrastive heads for capturing both seen and unseen anomaly classes; and an Anomaly Scoring module leveraging the insights of the three heads to detect anomalies. Extensive experiments on three real-world datasets consistently show that our approach surpasses existing methods under various experimental settings, thus establishing a new state-of-the-art performance in the TSAD field.  ( 2 min )
    Preference Optimization for Molecular Language Models. (arXiv:2310.12304v1 [stat.ML])
    Molecular language modeling is an effective approach to generating novel chemical structures. However, these models do not \emph{a priori} encode certain preferences a chemist may desire. We investigate the use of fine-tuning using Direct Preference Optimization to better align generated molecules with chemist preferences. Our findings suggest that this approach is simple, efficient, and highly effective.  ( 2 min )
    SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation. (arXiv:2310.12508v1 [cs.LG])
    With evolving data regulations, machine unlearning (MU) has become an important tool for fostering trust and safety in today's AI models. However, existing MU methods focusing on data and/or weight perspectives often grapple with limitations in unlearning accuracy, stability, and cross-domain applicability. To address these challenges, we introduce the concept of 'weight saliency' in MU, drawing parallels with input saliency in model explanation. This innovation directs MU's attention toward specific model weights rather than the entire model, improving effectiveness and efficiency. The resultant method that we call saliency unlearning (SalUn) narrows the performance gap with 'exact' unlearning (model retraining from scratch after removing the forgetting dataset). To the best of our knowledge, SalUn is the first principled MU approach adaptable enough to effectively erase the influence of forgetting data, classes, or concepts in both image classification and generation. For example, SalUn yields a stability advantage in high-variance random data forgetting, e.g., with a 0.2% gap compared to exact unlearning on the CIFAR-10 dataset. Moreover, in preventing conditional diffusion models from generating harmful images, SalUn achieves nearly 100% unlearning accuracy, outperforming current state-of-the-art baselines like Erased Stable Diffusion and Forget-Me-Not.  ( 2 min )
    SDGym: Low-Code Reinforcement Learning Environments using System Dynamics Models. (arXiv:2310.12494v1 [cs.LG])
    Understanding the long-term impact of algorithmic interventions on society is vital to achieving responsible AI. Traditional evaluation strategies often fall short due to the complex, adaptive and dynamic nature of society. While reinforcement learning (RL) can be a powerful approach for optimizing decisions in dynamic settings, the difficulty of realistic environment design remains a barrier to building robust agents that perform well in practical settings. To address this issue we tap into the field of system dynamics (SD) as a complementary method that incorporates collaborative simulation model specification practices. We introduce SDGym, a low-code library built on the OpenAI Gym framework which enables the generation of custom RL environments based on SD simulation models. Through a feasibility study we validate that well specified, rich RL environments can be generated from preexisting SD models and a few lines of configuration code. We demonstrate the capabilities of the SDGym environment using an SD model of the electric vehicle adoption problem. We compare two SD simulators, PySD and BPTK-Py for parity, and train a D4PG agent using the Acme framework to showcase learning and environment interaction. Our preliminary findings underscore the dual potential of SD to improve RL environment design and for RL to improve dynamic policy discovery within SD models. By open-sourcing SDGym, the intent is to galvanize further research and promote adoption across the SD and RL communities, thereby catalyzing collaboration in this emerging interdisciplinary space.  ( 2 min )
    Jorge: Approximate Preconditioning for GPU-efficient Second-order Optimization. (arXiv:2310.12298v1 [cs.LG])
    Despite their better convergence properties compared to first-order optimizers, second-order optimizers for deep learning have been less popular due to their significant computational costs. The primary efficiency bottleneck in such optimizers is matrix inverse calculations in the preconditioning step, which are expensive to compute on GPUs. In this paper, we introduce Jorge, a second-order optimizer that promises the best of both worlds -- rapid convergence benefits of second-order methods, and high computational efficiency typical of first-order methods. We address the primary computational bottleneck of computing matrix inverses by completely eliminating them using an approximation of the preconditioner computation. This makes Jorge extremely efficient on GPUs in terms of wall-clock time. Further, we describe an approach to determine Jorge's hyperparameters directly from a well-tuned SGD baseline, thereby significantly minimizing tuning efforts. Our empirical evaluations demonstrate the distinct advantages of using Jorge, outperforming state-of-the-art optimizers such as SGD, AdamW, and Shampoo across multiple deep learning models, both in terms of sample efficiency and wall-clock time.  ( 2 min )
    CAT: Closed-loop Adversarial Training for Safe End-to-End Driving. (arXiv:2310.12432v1 [cs.LG])
    Driving safety is a top priority for autonomous vehicles. Orthogonal to prior work handling accident-prone traffic events by algorithm designs at the policy level, we investigate a Closed-loop Adversarial Training (CAT) framework for safe end-to-end driving in this paper through the lens of environment augmentation. CAT aims to continuously improve the safety of driving agents by training the agent on safety-critical scenarios that are dynamically generated over time. A novel resampling technique is developed to turn log-replay real-world driving scenarios into safety-critical ones via probabilistic factorization, where the adversarial traffic generation is modeled as the multiplication of standard motion prediction sub-problems. Consequently, CAT can launch more efficient physical attacks compared to existing safety-critical scenario generation methods and yields a significantly less computational cost in the iterative learning pipeline. We incorporate CAT into the MetaDrive simulator and validate our approach on hundreds of driving scenarios imported from real-world driving datasets. Experimental results demonstrate that CAT can effectively generate adversarial scenarios countering the agent being trained. After training, the agent can achieve superior driving safety in both log-replay and safety-critical traffic scenarios on the held-out test set. Code and data are available at https://metadriverse.github.io/cat.  ( 2 min )
  • Open

    Symmetric Neural-Collapse Representations with Supervised Contrastive Loss: The Impact of ReLU and Batching. (arXiv:2306.07960v2 [cs.LG] UPDATED)
    Supervised contrastive loss (SCL) is a competitive and often superior alternative to the cross-entropy loss for classification. While prior studies have demonstrated that both losses yield symmetric training representations under balanced data, this symmetry breaks under class imbalances. This paper presents an intriguing discovery: the introduction of a ReLU activation at the final layer effectively restores the symmetry in SCL-learned representations. We arrive at this finding analytically, by establishing that the global minimizers of an unconstrained features model with SCL loss and entry-wise non-negativity constraints form an orthogonal frame. Extensive experiments conducted across various datasets, architectures, and imbalance scenarios corroborate our finding. Importantly, our experiments reveal that the inclusion of the ReLU activation restores symmetry without compromising test accuracy. This constitutes the first geometry characterization of SCL under imbalances. Additionally, our analysis and experiments underscore the pivotal role of batch selection strategies in representation geometry. By proving necessary and sufficient conditions for mini-batch choices that ensure invariant symmetric representations, we introduce batch-binding as an efficient strategy that guarantees these conditions hold.  ( 2 min )
    A Computational Framework for Solving Wasserstein Lagrangian Flows. (arXiv:2310.10649v2 [cs.LG] CROSS LISTED)
    The dynamical formulation of the optimal transport can be extended through various choices of the underlying geometry ($\textit{kinetic energy}$), and the regularization of density paths ($\textit{potential energy}$). These combinations yield different variational problems ($\textit{Lagrangians}$), encompassing many variations of the optimal transport problem such as the Schr\"odinger bridge, unbalanced optimal transport, and optimal transport with physical constraints, among others. In general, the optimal density path is unknown, and solving these variational problems can be computationally challenging. Leveraging the dual formulation of the Lagrangians, we propose a novel deep learning based framework approaching all of these problems from a unified perspective. Our method does not require simulating or backpropagating through the trajectories of the learned dynamics, and does not need access to optimal couplings. We showcase the versatility of the proposed framework by outperforming previous approaches for the single-cell trajectory inference, where incorporating prior knowledge into the dynamics is crucial for correct predictions.  ( 2 min )
    Seeking the Truth Beyond the Data. An Unsupervised Machine Learning Approach. (arXiv:2207.06949v4 [stat.ML] UPDATED)
    Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grouped together aiming to the construction of well-established clusters that their elements are classified according to their similarity. The goal of this process is to provide a useful aid to the researcher that will help her/him to identify patterns among the data. Dealing with large databases, such patterns may not be easily detectable without the contribution of a clustering algorithm. This article provides a deep description of the most widely used clustering methodologies accompanied by useful presentations concerning suitable parameter selection and initializations. Simultaneously, this article not only represents a review highlighting the major elements of examined clustering techniques but emphasizes the comparison of these algorithms' clustering efficiency based on 3 datasets, revealing their existing weaknesses and capabilities through accuracy and complexity, during the confrontation of discrete and continuous observations. The produced results help us extract valuable conclusions about the appropriateness of the examined clustering techniques in accordance with the dataset's size.  ( 3 min )
    Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data. (arXiv:1811.11479v2 [cs.LG] UPDATED)
    On-device machine learning (ML) enables the training process to exploit a massive amount of user-generated private data samples. To enjoy this benefit, inter-device communication overhead should be minimized. With this end, we propose federated distillation (FD), a distributed model training algorithm whose communication payload size is much smaller than a benchmark scheme, federated learning (FL), particularly when the model size is large. Moreover, user-generated data samples are likely to become non-IID across devices, which commonly degrades the performance compared to the case with an IID dataset. To cope with this, we propose federated augmentation (FAug), where each device collectively trains a generative model, and thereby augments its local data towards yielding an IID dataset. Empirical studies demonstrate that FD with FAug yields around 26x less communication overhead while achieving 95-98% test accuracy compared to FL.  ( 2 min )
    Optimality Guarantees for Particle Belief Approximation of POMDPs. (arXiv:2210.05015v5 [cs.AI] UPDATED)
    Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood weighting have shown practical effectiveness, a general theory characterizing the approximation error of the particle filtering techniques that these algorithms use has not previously been proposed. Our main contribution is bounding the error between any POMDP and its corresponding finite sample particle belief MDP (PB-MDP) approximation. This fundamental bridge between PB-MDPs and POMDPs allows us to adapt any sampling-based MDP algorithm to a POMDP by solving the corresponding particle belief MDP, thereby extending the convergence guarantees of the MDP algorithm to the POMDP. Practically, this is implemented by using the particle filter belief transition model as the generative model for the MDP solver. While this requires access to the observation density model from the POMDP, it only increases the transition sampling complexity of the MDP solver by a factor of $\mathcal{O}(C)$, where $C$ is the number of particles. Thus, when combined with sparse sampling MDP algorithms, this approach can yield algorithms for POMDPs that have no direct theoretical dependence on the size of the state and observation spaces. In addition to our theoretical contribution, we perform five numerical experiments on benchmark POMDPs to demonstrate that a simple MDP algorithm adapted using PB-MDP approximation, Sparse-PFT, achieves performance competitive with other leading continuous observation POMDP solvers.  ( 3 min )
    Variational Inference for SDEs Driven by Fractional Noise. (arXiv:2310.12975v1 [cs.LG])
    We present a novel variational framework for performing inference in (neural) stochastic differential equations (SDEs) driven by Markov-approximate fractional Brownian motion (fBM). SDEs offer a versatile tool for modeling real-world continuous-time dynamic systems with inherent noise and randomness. Combining SDEs with the powerful inference capabilities of variational methods, enables the learning of representative function distributions through stochastic gradient descent. However, conventional SDEs typically assume the underlying noise to follow a Brownian motion (BM), which hinders their ability to capture long-term dependencies. In contrast, fractional Brownian motion (fBM) extends BM to encompass non-Markovian dynamics, but existing methods for inferring fBM parameters are either computationally demanding or statistically inefficient. In this paper, building upon the Markov approximation of fBM, we derive the evidence lower bound essential for efficient variational inference of posterior path measures, drawing from the well-established field of stochastic analysis. Additionally, we provide a closed-form expression to determine optimal approximation coefficients. Furthermore, we propose the use of neural networks to learn the drift, diffusion and control terms within our variational posterior, leading to the variational training of neural-SDEs. In this framework, we also optimize the Hurst index, governing the nature of our fractional noise. Beyond validation on synthetic data, we contribute a novel architecture for variational latent video prediction,-an approach that, to the best of our knowledge, enables the first variational neural-SDE application to video perception.  ( 3 min )
    The Kernel Density Integral Transformation. (arXiv:2309.10194v2 [stat.ML] UPDATED)
    Feature preprocessing continues to play a critical role when applying machine learning and statistical methods to tabular data. In this paper, we propose the use of the kernel density integral transformation as a feature preprocessing step. Our approach subsumes the two leading feature preprocessing methods as limiting cases: linear min-max scaling and quantile transformation. We demonstrate that, without hyperparameter tuning, the kernel density integral transformation can be used as a simple drop-in replacement for either method, offering protection from the weaknesses of each. Alternatively, with tuning of a single continuous hyperparameter, we frequently outperform both of these methods. Finally, we show that the kernel density transformation can be profitably applied to statistical data analysis, particularly in correlation analysis and univariate clustering.  ( 2 min )
    PAC Prediction Sets Under Label Shift. (arXiv:2310.12964v1 [stat.ML])
    Prediction sets capture uncertainty by predicting sets of labels rather than individual labels, enabling downstream decisions to conservatively account for all plausible outcomes. Conformal inference algorithms construct prediction sets guaranteed to contain the true label with high probability. These guarantees fail to hold in the face of distribution shift, which is precisely when reliable uncertainty quantification can be most useful. We propose a novel algorithm for constructing prediction sets with PAC guarantees in the label shift setting. This method estimates the predicted probabilities of the classes in a target domain, as well as the confusion matrix, then propagates uncertainty in these estimates through a Gaussian elimination algorithm to compute confidence intervals for importance weights. Finally, it uses these intervals to construct prediction sets. We evaluate our approach on five datasets: the CIFAR-10, ChestX-Ray and Entity-13 image datasets, the tabular CDC Heart dataset, and the AGNews text dataset. Our algorithm satisfies the PAC guarantee while producing smaller, more informative, prediction sets compared to several baselines.  ( 2 min )
    The Adaptive $\tau$-Lasso: Robustness and Oracle Properties. (arXiv:2304.09310v2 [stat.ML] UPDATED)
    This paper introduces a new regularized version of the robust $\tau$-regression estimator for analyzing high-dimensional datasets subject to gross contamination in the response variables and covariates (explanatory variables). The resulting estimator, termed adaptive $\tau$-Lasso, is robust to outliers and high-leverage points. It also incorporates an adaptive $\ell_1$-norm penalty term, which enables the selection of relevant variables and reduces the bias associated with large true regression coefficients. More specifically, this adaptive $\ell_1$-norm penalty term assigns a weight to each regression coefficient. For a fixed number of predictors $p$, we show that the adaptive $\tau$-Lasso has the oracle property, ensuring both variable-selection consistency and asymptotic normality. Asymptotic normality applies only to the entries of the regression vector corresponding to the true support, assuming knowledge of the true regression vector support. We characterize its robustness via the finite-sample breakdown point and the influence function. We carry out extensive simulations and observe that the class of $\tau$-Lasso estimators exhibits robustness and reliable performance in both contaminated and uncontaminated data settings. We also validate our theoretical findings on robustness properties through simulation experiments. In the face of outliers and high-leverage points, the adaptive $\tau$-Lasso and $\tau$-Lasso estimators achieve the best performance or close-to-best performance in terms of prediction and variable selection accuracy compared to other competing regularized estimators for all scenarios considered in this study. Therefore, the adaptive $\tau$-Lasso and $\tau$-Lasso estimators can be effectively employed for a variety of sparse linear regression problems, particularly in high-dimensional settings and when the data is contaminated by outliers and high-leverage points.  ( 3 min )
    URL: A Representation Learning Benchmark for Transferable Uncertainty Estimates. (arXiv:2307.03810v2 [cs.LG] UPDATED)
    Representation learning has significantly driven the field to develop pretrained models that can act as a valuable starting point when transferring to new datasets. With the rising demand for reliable machine learning and uncertainty quantification, there is a need for pretrained models that not only provide embeddings but also transferable uncertainty estimates. To guide the development of such models, we propose the Uncertainty-aware Representation Learning (URL) benchmark. Besides the transferability of the representations, it also measures the zero-shot transferability of the uncertainty estimate using a novel metric. We apply URL to evaluate eleven uncertainty quantifiers that are pretrained on ImageNet and transferred to eight downstream datasets. We find that approaches that focus on the uncertainty of the representation itself or estimate the prediction risk directly outperform those that are based on the probabilities of upstream classes. Yet, achieving transferable uncertainty quantification remains an open challenge. Our findings indicate that it is not necessarily in conflict with traditional representation learning goals. Code is provided under https://github.com/mkirchhof/url .  ( 2 min )
    Sequential Gibbs Posteriors with Applications to Principal Component Analysis. (arXiv:2310.12882v1 [stat.ME])
    Gibbs posteriors are proportional to a prior distribution multiplied by an exponentiated loss function, with a key tuning parameter weighting information in the loss relative to the prior and providing a control of posterior uncertainty. Gibbs posteriors provide a principled framework for likelihood-free Bayesian inference, but in many situations, including a single tuning parameter inevitably leads to poor uncertainty quantification. In particular, regardless of the value of the parameter, credible regions have far from the nominal frequentist coverage even in large samples. We propose a sequential extension to Gibbs posteriors to address this problem. We prove the proposed sequential posterior exhibits concentration and a Bernstein-von Mises theorem, which holds under easy to verify conditions in Euclidean space and on manifolds. As a byproduct, we obtain the first Bernstein-von Mises theorem for traditional likelihood-based Bayesian posteriors on manifolds. All methods are illustrated with an application to principal component analysis.  ( 2 min )
    A path-norm toolkit for modern networks: consequences, promises and challenges. (arXiv:2310.01225v2 [stat.ML] UPDATED)
    This work introduces the first toolkit around path-norms that is fully able to encompass general DAG ReLU networks with biases, skip connections and any operation based on the extraction of order statistics: max pooling, GroupSort etc. This toolkit notably allows us to establish generalization bounds for modern neural networks that are not only the most widely applicable path-norm based ones, but also recover or beat the sharpest known bounds of this type. These extended path-norms further enjoy the usual benefits of path-norms: ease of computation, invariance under the symmetries of the network, and improved sharpness on feedforward networks compared to the product of operators' norms, another complexity measure most commonly used. The versatility of the toolkit and its ease of implementation allow us to challenge the concrete promises of path-norm-based generalization bounds, by numerically evaluating the sharpest known bounds for ResNets on ImageNet.  ( 2 min )
    Evaluating Superhuman Models with Consistency Checks. (arXiv:2306.09983v3 [cs.LG] UPDATED)
    If machine learning models were to achieve superhuman abilities at various reasoning or decision-making tasks, how would we go about evaluating such models, given that humans would necessarily be poor proxies for ground truth? In this paper, we propose a framework for evaluating superhuman models via consistency checks. Our premise is that while the correctness of superhuman decisions may be impossible to evaluate, we can still surface mistakes if the model's decisions fail to satisfy certain logical, human-interpretable rules. We instantiate our framework on three tasks where correctness of decisions is hard to evaluate due to either superhuman model abilities, or to otherwise missing ground truth: evaluating chess positions, forecasting future events, and making legal judgments. We show that regardless of a model's (possibly superhuman) performance on these tasks, we can discover logical inconsistencies in decision making. For example: a chess engine assigning opposing valuations to semantically identical boards; GPT-4 forecasting that sports records will evolve non-monotonically over time; or an AI judge assigning bail to a defendant only after we add a felony to their criminal record.  ( 2 min )
    Model-agnostic variable importance for predictive uncertainty: an entropy-based approach. (arXiv:2310.12842v1 [stat.ML])
    In order to trust the predictions of a machine learning algorithm, it is necessary to understand the factors that contribute to those predictions. In the case of probabilistic and uncertainty-aware models, it is necessary to understand not only the reasons for the predictions themselves, but also the model's level of confidence in those predictions. In this paper, we show how existing methods in explainability can be extended to uncertainty-aware models and how such extensions can be used to understand the sources of uncertainty in a model's predictive distribution. In particular, by adapting permutation feature importance, partial dependence plots, and individual conditional expectation plots, we demonstrate that novel insights into model behaviour may be obtained and that these methods can be used to measure the impact of features on both the entropy of the predictive distribution and the log-likelihood of the ground truth labels under that distribution. With experiments using both synthetic and real-world data, we demonstrate the utility of these approaches in understanding both the sources of uncertainty and their impact on model performance.  ( 2 min )
    Log-density gradient covariance and automatic metric tensors for Riemann manifold Monte Carlo methods. (arXiv:2211.01746v2 [stat.CO] UPDATED)
    A metric tensor for Riemann manifold Monte Carlo particularly suited for non-linear Bayesian hierarchical models is proposed. The metric tensor is built from symmetric positive semidefinite log-density gradient covariance (LGC) matrices, which are also proposed and further explored here. The LGCs generalize the Fisher information matrix by measuring the joint information content and dependence structure of both a random variable and the parameters of said variable. Consequently, positive definite Fisher/LGC-based metric tensors may be constructed not only from the observation likelihoods as is current practice, but also from arbitrarily complicated non-linear prior/latent variable structures, provided the LGC may be derived for each conditional distribution used to construct said structures. The proposed methodology is highly automatic and allows for exploitation of any sparsity associated with the model in question. When implemented in conjunction with a Riemann manifold variant of the recently proposed numerical generalized randomized Hamiltonian Monte Carlo processes, the proposed methodology is highly competitive, in particular for the more challenging target distributions associated with Bayesian hierarchical models.  ( 2 min )
    Physics-informed neural networks in the recreation of hydrodynamic simulations from dark matter. (arXiv:2303.14090v2 [astro-ph.CO] UPDATED)
    Physics-informed neural networks have emerged as a coherent framework for building predictive models that combine statistical patterns with domain knowledge. The underlying notion is to enrich the optimization loss function with known relationships to constrain the space of possible solutions. Hydrodynamic simulations are a core constituent of modern cosmology, while the required computations are both expensive and time-consuming. At the same time, the comparatively fast simulation of dark matter requires fewer resources, which has led to the emergence of machine learning algorithms for baryon inpainting as an active area of research; here, recreating the scatter found in hydrodynamic simulations is an ongoing challenge. This paper presents the first application of physics-informed neural networks to baryon inpainting by combining advances in neural network architectures with physical constraints, injecting theory on baryon conversion efficiency into the model loss function. We also introduce a punitive prediction comparison based on the Kullback-Leibler divergence, which enforces scatter reproduction. By simultaneously extracting the complete set of baryonic properties for the Simba suite of cosmological simulations, our results demonstrate improved accuracy of baryonic predictions based on dark matter halo properties, successful recovery of the fundamental metallicity relation, and retrieve scatter that traces the target simulation's distribution.  ( 3 min )
    EDGI: Equivariant Diffusion for Planning with Embodied Agents. (arXiv:2303.12410v2 [cs.LG] UPDATED)
    Embodied agents operate in a structured world, often solving tasks with spatial, temporal, and permutation symmetries. Most algorithms for planning and model-based reinforcement learning (MBRL) do not take this rich geometric structure into account, leading to sample inefficiency and poor generalization. We introduce the Equivariant Diffuser for Generating Interactions (EDGI), an algorithm for MBRL and planning that is equivariant with respect to the product of the spatial symmetry group SE(3), the discrete-time translation group Z, and the object permutation group Sn. EDGI follows the Diffuser framework (Janner et al., 2022) in treating both learning a world model and planning in it as a conditional generative modeling problem, training a diffusion model on an offline trajectory dataset. We introduce a new SE(3)xZxSn-equivariant diffusion model that supports multiple representations. We integrate this model in a planning loop, where conditioning and classifier guidance let us softly break the symmetry for specific tasks as needed. On object manipulation and navigation tasks, EDGI is substantially more sample efficient and generalizes better across the symmetry group than non-equivariant models.  ( 2 min )
    Generative Flow Networks as Entropy-Regularized RL. (arXiv:2310.12934v1 [cs.LG])
    The recently proposed generative flow networks (GFlowNets) are a method of training a policy to sample compositional discrete objects with probabilities proportional to a given reward via a sequence of actions. GFlowNets exploit the sequential nature of the problem, drawing parallels with reinforcement learning (RL). Our work extends the connection between RL and GFlowNets to a general case. We demonstrate how the task of learning a generative flow network can be efficiently redefined as an entropy-regularized RL problem with a specific reward and regularizer structure. Furthermore, we illustrate the practical efficiency of this reformulation by applying standard soft RL algorithms to GFlowNet training across several probabilistic modeling tasks. Contrary to previously reported results, we show that entropic RL approaches can be competitive against established GFlowNet training methods. This perspective opens a direct path for integrating reinforcement learning principles into the realm of generative flow networks.  ( 2 min )
    Piecewise Deterministic Markov Processes for Bayesian Neural Networks. (arXiv:2302.08724v2 [stat.ML] UPDATED)
    Inference on modern Bayesian Neural Networks (BNNs) often relies on a variational inference treatment, imposing violated assumptions of independence and the form of the posterior. Traditional MCMC approaches avoid these assumptions at the cost of increased computation due to its incompatibility to subsampling of the likelihood. New Piecewise Deterministic Markov Process (PDMP) samplers permit subsampling, though introduce a model specific inhomogenous Poisson Process (IPPs) which is difficult to sample from. This work introduces a new generic and adaptive thinning scheme for sampling from these IPPs, and demonstrates how this approach can accelerate the application of PDMPs for inference in BNNs. Experimentation illustrates how inference with these methods is computationally feasible, can improve predictive accuracy, MCMC mixing performance, and provide informative uncertainty measurements when compared against other approximate inference schemes.  ( 2 min )
    Deep Discriminative to Kernel Density Networks for Calibrated Inference. (arXiv:2201.13001v6 [cs.LG] UPDATED)
    Deep discriminative approaches like random forests and deep neural networks have recently found applications in many important real-world scenarios. However, deploying these learning algorithms in safety-critical applications raises concerns, particularly when it comes to ensuring confidence calibration for both in-distribution and out-of-distribution data points. Many popular methods for in-distribution (ID) calibration, such as isotonic regression and Platt's sigmoidal regression, exhibit excellent ID calibration performance but often at the cost of classification accuracy. Moreover, these methods are not calibrated for the entire feature space, leading to overconfidence in the case of out-of-distribution (OOD) samples. In this paper, we leveraged the fact that deep models, including both random forests and deep-nets, learn internal representations which are unions of polytopes with affine activation functions to conceptualize them both as partitioning rules of the feature space. We replace the affine function in each polytope populated by the training data with a Gaussian kernel. We propose sufficient conditions for our proposed methods to be consistent estimators of the corresponding class conditional densities. Moreover, our experiments on both tabular and vision benchmarks show that the proposed approaches obtain well-calibrated posteriors while mostly preserving or improving the classification accuracy of the original algorithm for in-distribution region, and extrapolates beyond the training data to handle out-of-distribution inputs appropriately.  ( 3 min )
    Neurosymbolic Grounding for Compositional World Models. (arXiv:2310.12690v1 [cs.LG])
    We introduce Cosmos, a framework for object-centric world modeling that is designed for compositional generalization (CG), i.e., high performance on unseen input scenes obtained through the composition of known visual "atoms." The central insight behind Cosmos is the use of a novel form of neurosymbolic grounding. Specifically, the framework introduces two new tools: (i) neurosymbolic scene encodings, which represent each entity in a scene using a real vector computed using a neural encoder, as well as a vector of composable symbols describing attributes of the entity, and (ii) a neurosymbolic attention mechanism that binds these entities to learned rules of interaction. Cosmos is end-to-end differentiable; also, unlike traditional neurosymbolic methods that require representations to be manually mapped to symbols, it computes an entity's symbolic attributes using vision-language foundation models. Through an evaluation that considers two different forms of CG on an established blocks-pushing domain, we show that the framework establishes a new state-of-the-art for CG in world modeling.  ( 2 min )
    DCSI -- An improved measure of cluster separability based on separation and connectedness. (arXiv:2310.12806v1 [stat.ML])
    Whether class labels in a given data set correspond to meaningful clusters is crucial for the evaluation of clustering algorithms using real-world data sets. This property can be quantified by separability measures. A review of the existing literature shows that neither classification-based complexity measures nor cluster validity indices (CVIs) adequately incorporate the central aspects of separability for density-based clustering: between-class separation and within-class connectedness. A newly developed measure (density cluster separability index, DCSI) aims to quantify these two characteristics and can also be used as a CVI. Extensive experiments on synthetic data indicate that DCSI correlates strongly with the performance of DBSCAN measured via the adjusted rand index (ARI) but lacks robustness when it comes to multi-class data sets with overlapping classes that are ill-suited for density-based hard clustering. Detailed evaluation on frequently used real-world data sets shows that DCSI can correctly identify touching or overlapping classes that do not form meaningful clusters.  ( 2 min )
    Compression of Recurrent Neural Networks using Matrix Factorization. (arXiv:2310.12688v1 [cs.LG])
    Compressing neural networks is a key step when deploying models for real-time or embedded applications. Factorizing the model's matrices using low-rank approximations is a promising method for achieving compression. While it is possible to set the rank before training, this approach is neither flexible nor optimal. In this work, we propose a post-training rank-selection method called Rank-Tuning that selects a different rank for each matrix. Used in combination with training adaptations, our method achieves high compression rates with no or little performance degradation. Our numerical experiments on signal processing tasks show that we can compress recurrent neural networks up to 14x with at most 1.4% relative performance reduction.  ( 2 min )
    Conditional Density Estimations from Privacy-Protected Data. (arXiv:2310.12781v1 [stat.ML])
    Many modern statistical analysis and machine learning applications require training models on sensitive user data. Differential privacy provides a formal guarantee that individual-level information about users does not leak. In this framework, randomized algorithms inject calibrated noise into the confidential data, resulting in privacy-protected datasets or queries. However, restricting access to only the privatized data during statistical analysis makes it computationally challenging to perform valid inferences on parameters underlying the confidential data. In this work, we propose simulation-based inference methods from privacy-protected datasets. Specifically, we use neural conditional density estimators as a flexible family of distributions to approximate the posterior distribution of model parameters given the observed private query results. We illustrate our methods on discrete time-series data under an infectious disease model and on ordinary linear regression models. Illustrating the privacy-utility trade-off, our experiments and analysis demonstrate the necessity and feasibility of designing valid statistical inference procedures to correct for biases introduced by the privacy-protection mechanisms.  ( 2 min )
    Causal Similarity-Based Hierarchical Bayesian Models. (arXiv:2310.12595v1 [cs.LG])
    The key challenge underlying machine learning is generalisation to new data. This work studies generalisation for datasets consisting of related tasks that may differ in causal mechanisms. For example, observational medical data for complex diseases suffers from heterogeneity in causal mechanisms of disease across patients, creating challenges for machine learning algorithms that need to generalise to new patients outside of the training dataset. Common approaches for learning supervised models with heterogeneous datasets include learning a global model for the entire dataset, learning local models for each tasks' data, or utilising hierarchical, meta-learning and multi-task learning approaches to learn how to generalise from data pooled across multiple tasks. In this paper we propose causal similarity-based hierarchical Bayesian models to improve generalisation to new tasks by learning how to pool data from training tasks with similar causal mechanisms. We apply this general modelling principle to Bayesian neural networks and compare a variety of methods for estimating causal task similarity (for both known and unknown causal models). We demonstrate the benefits of our approach and applicability to real world problems through a range of experiments on simulated and real data.  ( 2 min )
    STANLEY: Stochastic Gradient Anisotropic Langevin Dynamics for Learning Energy-Based Models. (arXiv:2310.12667v1 [stat.ML])
    We propose in this paper, STANLEY, a STochastic gradient ANisotropic LangEvin dYnamics, for sampling high dimensional data. With the growing efficacy and potential of Energy-Based modeling, also known as non-normalized probabilistic modeling, for modeling a generative process of different natures of high dimensional data observations, we present an end-to-end learning algorithm for Energy-Based models (EBM) with the purpose of improving the quality of the resulting sampled data points. While the unknown normalizing constant of EBMs makes the training procedure intractable, resorting to Markov Chain Monte Carlo (MCMC) is in general a viable option. Realizing what MCMC entails for the EBM training, we propose in this paper, a novel high dimensional sampling method, based on an anisotropic stepsize and a gradient-informed covariance matrix, embedded into a discretized Langevin diffusion. We motivate the necessity for an anisotropic update of the negative samples in the Markov Chain by the nonlinearity of the backbone of the EBM, here a Convolutional Neural Network. Our resulting method, namely STANLEY, is an optimization algorithm for training Energy-Based models via our newly introduced MCMC method. We provide a theoretical understanding of our sampling scheme by proving that the sampler leads to a geometrically uniformly ergodic Markov Chain. Several image generation experiments are provided in our paper to show the effectiveness of our method.  ( 2 min )
    Generating collective counterfactual explanations in score-based classification via mathematical optimization. (arXiv:2310.12822v1 [stat.ML])
    Due to the increasing use of Machine Learning models in high stakes decision making settings, it has become increasingly important to have tools to understand how models arrive at decisions. Assuming a trained Supervised Classification model, explanations can be obtained via counterfactual analysis: a counterfactual explanation of an instance indicates how this instance should be minimally modified so that the perturbed instance is classified in the desired class by the Machine Learning classification model. Most of the Counterfactual Analysis literature focuses on the single-instance single-counterfactual setting, in which the analysis is done for one single instance to provide one single explanation. Taking a stakeholder's perspective, in this paper we introduce the so-called collective counterfactual explanations. By means of novel Mathematical Optimization models, we provide a counterfactual explanation for each instance in a group of interest, so that the total cost of the perturbations is minimized under some linking constraints. Making the process of constructing counterfactuals collective instead of individual enables us to detect the features that are critical to the entire dataset to have the individuals classified in the desired class. Our methodology allows for some instances to be treated individually, performing the collective counterfactual analysis for a fraction of records of the group of interest. This way, outliers are identified and handled appropriately. Under some assumptions on the classifier and the space in which counterfactuals are sought, finding collective counterfactuals is reduced to solving a convex quadratic linearly constrained mixed integer optimization problem, which, for datasets of moderate size, can be solved to optimality using existing solvers. The performance of our approach is illustrated on real-world datasets, demonstrating its usefulness.  ( 3 min )
    On the Optimization and Generalization of Multi-head Attention. (arXiv:2310.12680v1 [cs.LG])
    The training and generalization dynamics of the Transformer's core mechanism, namely the Attention mechanism, remain under-explored. Besides, existing analyses primarily focus on single-head attention. Inspired by the demonstrated benefits of overparameterization when training fully-connected networks, we investigate the potential optimization and generalization advantages of using multiple attention heads. Towards this goal, we derive convergence and generalization guarantees for gradient-descent training of a single-layer multi-head self-attention model, under a suitable realizability condition on the data. We then establish primitive conditions on the initialization that ensure realizability holds. Finally, we demonstrate that these conditions are satisfied for a simple tokenized-mixture model. We expect the analysis can be extended to various data-model and architecture variations.  ( 2 min )
    How a student becomes a teacher: learning and forgetting through Spectral methods. (arXiv:2310.12612v1 [cs.LG])
    In theoretical ML, the teacher-student paradigm is often employed as an effective metaphor for real-life tuition. The above scheme proves particularly relevant when the student network is overparameterized as compared to the teacher network. Under these operating conditions, it is tempting to speculate that the student ability to handle the given task could be eventually stored in a sub-portion of the whole network. This latter should be to some extent reminiscent of the frozen teacher structure, according to suitable metrics, while being approximately invariant across different architectures of the student candidate network. Unfortunately, state-of-the-art conventional learning techniques could not help in identifying the existence of such an invariant subnetwork, due to the inherent degree of non-convexity that characterizes the examined problem. In this work, we take a leap forward by proposing a radically different optimization scheme which builds on a spectral representation of the linear transfer of information between layers. The gradient is hence calculated with respect to both eigenvalues and eigenvectors with negligible increase in terms of computational and complexity load, as compared to standard training algorithms. Working in this framework, we could isolate a stable student substructure, that mirrors the true complexity of the teacher in terms of computing neurons, path distribution and topological attributes. When pruning unimportant nodes of the trained student, as follows a ranking that reflects the optimized eigenvalues, no degradation in the recorded performance is seen above a threshold that corresponds to the effective teacher size. The observed behavior can be pictured as a genuine second-order phase transition that bears universality traits.  ( 3 min )
    Constrained Reweighting of Distributions: an Optimal Transport Approach. (arXiv:2310.12447v1 [stat.ML])
    We commonly encounter the problem of identifying an optimally weight adjusted version of the empirical distribution of observed data, adhering to predefined constraints on the weights. Such constraints often manifest as restrictions on the moments, tail behaviour, shapes, number of modes, etc., of the resulting weight adjusted empirical distribution. In this article, we substantially enhance the flexibility of such methodology by introducing a nonparametrically imbued distributional constraints on the weights, and developing a general framework leveraging the maximum entropy principle and tools from optimal transport. The key idea is to ensure that the maximum entropy weight adjusted empirical distribution of the observed data is close to a pre-specified probability distribution in terms of the optimal transport metric while allowing for subtle departures. The versatility of the framework is demonstrated in the context of three disparate applications where data re-weighting is warranted to satisfy side constraints on the optimization problem at the heart of the statistical task: namely, portfolio allocation, semi-parametric inference for complex surveys, and ensuring algorithmic fairness in machine learning algorithms.  ( 2 min )
    Canonical normalizing flows for manifold learning. (arXiv:2310.12743v1 [stat.ML])
    Manifold learning flows are a class of generative modelling techniques that assume a low-dimensional manifold description of the data. The embedding of such manifold into the high-dimensional space of the data is achieved via learnable invertible transformations. Therefore, once the manifold is properly aligned via a reconstruction loss, the probability density is tractable on the manifold and maximum likelihood can be used optimize the network parameters. Naturally, the lower-dimensional representation of the data requires an injective-mapping. Recent approaches were able to enforce that density aligns with the modelled manifold, while efficiently calculating the density volume-change term when embedding to the higher-dimensional space. However, unless the injective-mapping is analytically predefined, the learned manifold is not necessarily an efficient representation of the data. Namely, the latent dimensions of such models frequently learn an entangled intrinsic basis with degenerate information being stored in each dimension. Alternatively, if a locally orthogonal and/or sparse basis is to be learned, here coined canonical intrinsic basis, it can serve in learning a more compact latent space representation. Towards this end, we propose a canonical manifold learning flow method, where a novel optimization objective enforces the transformation matrix to have few prominent and orthogonal basis functions. Canonical manifold flow yields a more efficient use of the latent space, automatically generating fewer prominent and distinct dimensions to represent data, and consequently a better approximation of target distributions than other manifold flow methods in most experiments we conducted, resulting in lower FID scores.  ( 2 min )
    Approximate information maximization for bandit games. (arXiv:2310.12563v1 [stat.ML])
    Entropy maximization and free energy minimization are general physical principles for modeling the dynamics of various physical systems. Notable examples include modeling decision-making within the brain using the free-energy principle, optimizing the accuracy-complexity trade-off when accessing hidden variables with the information bottleneck principle (Tishby et al., 2000), and navigation in random environments using information maximization (Vergassola et al., 2007). Built on this principle, we propose a new class of bandit algorithms that maximize an approximation to the information of a key variable within the system. To this end, we develop an approximated analytical physics-based representation of an entropy to forecast the information gain of each action and greedily choose the one with the largest information gain. This method yields strong performances in classical bandit settings. Motivated by its empirical success, we prove its asymptotic optimality for the two-armed bandit problem with Gaussian rewards. Owing to its ability to encompass the system's properties in a global physical functional, this approach can be efficiently adapted to more complex bandit settings, calling for further investigation of information maximization approaches for multi-armed bandit problems.  ( 2 min )
    Optimal Excess Risk Bounds for Empirical Risk Minimization on $p$-norm Linear Regression. (arXiv:2310.12437v1 [math.ST])
    We study the performance of empirical risk minimization on the $p$-norm linear regression problem for $p \in (1, \infty)$. We show that, in the realizable case, under no moment assumptions, and up to a distribution-dependent constant, $O(d)$ samples are enough to exactly recover the target. Otherwise, for $p \in [2, \infty)$, and under weak moment assumptions on the target and the covariates, we prove a high probability excess risk bound on the empirical risk minimizer whose leading term matches, up to a constant that depends only on $p$, the asymptotically exact rate. We extend this result to the case $p \in (1, 2)$ under mild assumptions that guarantee the existence of the Hessian of the risk at its minimizer.  ( 2 min )
    Explanation-Based Training with Differentiable Insertion/Deletion Metric-Aware Regularizers. (arXiv:2310.12553v1 [cs.LG])
    The quality of explanations for the predictions of complex machine learning predictors is often measured using insertion and deletion metrics, which assess the faithfulness of the explanations, i.e., how correctly the explanations reflect the predictor's behavior. To improve the faithfulness, we propose insertion/deletion metric-aware explanation-based optimization (ID-ExpO), which optimizes differentiable predictors to improve both insertion and deletion scores of the explanations while keeping their predictive accuracy. Since the original insertion and deletion metrics are indifferentiable with respect to the explanations and directly unavailable for gradient-based optimization, we extend the metrics to be differentiable and use them to formalize insertion and deletion metric-based regularizers. The experimental results on image and tabular datasets show that the deep neural networks-based predictors fine-tuned using ID-ExpO enable popular post-hoc explainers to produce more faithful and easy-to-interpret explanations while keeping high predictive accuracy.  ( 2 min )
    Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights. (arXiv:2310.12462v1 [cs.LG])
    In the realm of deep learning, transformers have emerged as a dominant architecture, particularly in natural language processing tasks. However, with their widespread adoption, concerns regarding the security and privacy of the data processed by these models have arisen. In this paper, we address a pivotal question: Can the data fed into transformers be recovered using their attention weights and outputs? We introduce a theoretical framework to tackle this problem. Specifically, we present an algorithm that aims to recover the input data $X \in \mathbb{R}^{d \times n}$ from given attention weights $W = QK^\top \in \mathbb{R}^{d \times d}$ and output $B \in \mathbb{R}^{n \times n}$ by minimizing the loss function $L(X)$. This loss function captures the discrepancy between the expected output and the actual output of the transformer. Our findings have significant implications for the Localized Layer-wise Mechanism (LLM), suggesting potential vulnerabilities in the model's design from a security and privacy perspective. This work underscores the importance of understanding and safeguarding the internal workings of transformers to ensure the confidentiality of processed data.  ( 2 min )
    Neural Likelihood Approximation for Integer Valued Time Series Data. (arXiv:2310.12544v1 [stat.ML])
    Stochastic processes defined on integer valued state spaces are popular within the physical and biological sciences. These models are necessary for capturing the dynamics of small systems where the individual nature of the populations cannot be ignored and stochastic effects are important. The inference of the parameters of such models, from time series data, is difficult due to intractability of the likelihood; current methods, based on simulations of the underlying model, can be so computationally expensive as to be prohibitive. In this paper we construct a neural likelihood approximation for integer valued time series data using causal convolutions, which allows us to evaluate the likelihood of the whole time series in parallel. We demonstrate our method by performing inference on a number of ecological and epidemiological models, showing that we can accurately approximate the true posterior while achieving significant computational speed ups in situations where current methods struggle.  ( 2 min )
    Closed-Form Diffusion Models. (arXiv:2310.12395v1 [cs.LG])
    Score-based generative models (SGMs) sample from a target distribution by iteratively transforming noise using the score function of the perturbed target. For any finite training set, this score function can be evaluated in closed form, but the resulting SGM memorizes its training data and does not generate novel samples. In practice, one approximates the score by training a neural network via score-matching. The error in this approximation promotes generalization, but neural SGMs are costly to train and sample, and the effective regularization this error provides is not well-understood theoretically. In this work, we instead explicitly smooth the closed-form score to obtain an SGM that generates novel samples without training. We analyze our model and propose an efficient nearest-neighbor-based estimator of its score function. Using this estimator, our method achieves sampling times competitive with neural SGMs while running on consumer-grade CPUs.  ( 2 min )
    Towards Enhanced Local Explainability of Random Forests: a Proximity-Based Approach. (arXiv:2310.12428v1 [stat.ML])
    We initiate a novel approach to explain the out of sample performance of random forest (RF) models by exploiting the fact that any RF can be formulated as an adaptive weighted K nearest-neighbors model. Specifically, we use the proximity between points in the feature space learned by the RF to re-write random forest predictions exactly as a weighted average of the target labels of training data points. This linearity facilitates a local notion of explainability of RF predictions that generates attributions for any model prediction across observations in the training set, and thereby complements established methods like SHAP, which instead generates attributions for a model prediction across dimensions of the feature space. We demonstrate this approach in the context of a bond pricing model trained on US corporate bond trades, and compare our approach to various existing approaches to model explainability.  ( 2 min )
    Sparse high-dimensional linear mixed modeling with a partitioned empirical Bayes ECM algorithm. (arXiv:2310.12285v1 [stat.ME])
    High-dimensional longitudinal data is increasingly used in a wide range of scientific studies. However, there are few statistical methods for high-dimensional linear mixed models (LMMs), as most Bayesian variable selection or penalization methods are designed for independent observations. Additionally, the few available software packages for high-dimensional LMMs suffer from scalability issues. This work presents an efficient and accurate Bayesian framework for high-dimensional LMMs. We use empirical Bayes estimators of hyperparameters for increased flexibility and an Expectation-Conditional-Minimization (ECM) algorithm for computationally efficient maximum a posteriori probability (MAP) estimation of parameters. The novelty of the approach lies in its partitioning and parameter expansion as well as its fast and scalable computation. We illustrate Linear Mixed Modeling with PaRtitiOned empirical Bayes ECM (LMM-PROBE) in simulation studies evaluating fixed and random effects estimation along with computation time. A real-world example is provided using data from a study of lupus in children, where we identify genes and clinical factors associated with a new lupus biomarker and predict the biomarker over time.  ( 2 min )
    Preference Optimization for Molecular Language Models. (arXiv:2310.12304v1 [stat.ML])
    Molecular language modeling is an effective approach to generating novel chemical structures. However, these models do not \emph{a priori} encode certain preferences a chemist may desire. We investigate the use of fine-tuning using Direct Preference Optimization to better align generated molecules with chemist preferences. Our findings suggest that this approach is simple, efficient, and highly effective.  ( 2 min )

  • Open

    [D] Is lang chain the right solution?
    Hello, I would love to have an LLm that can provide answers (in chat format) based some of the sql db data we have. Want it for an internal company project. I am by no means an expert but decent in programming and want to build a system to get answers in chat format. My understanding is that training LLMs ground up is prohibitively expensive and langchains are sort of hybrid , efficient solutions. Please suggest any other solutions. Also would Langchain being a company and not open source pose a problem in terms of copyrights? Thanks! submitted by /u/betelgeuseian [link] [comments]  ( 9 min )
    [R] MemGPT: Towards LLMs as Operating Systems - UC Berkeley 2023 - Is able to create unbounded/infinite LLM context!
    Paper: https://arxiv.org/abs/2310.08560 Github: https://github.com/cpacker/MemGPT Blog: https://memgpt.ai/ Youtube: https://youtu.be/QQ2QOPWZKVc?si=_bSSXU9EQE0FP64h MemGPT 🧠 Giving AI Unlimited Prompt Size (Big Step Towards AGI?) by Metthew Berman / Must watch and he also explains how to install it! Overview LLMs are increasingly being used for perpetual chats Limited context lengths makes perpetual chat challenging MemGPT manages a virtual context (inspired by virtual memory in operating systems) to create unbounded LLM context With MemGPT, we demonstrate that LLMs can be taught to manage their own memory! Abstract: Large language models (LLMs) have revolutionized AI, but are constrained by limited context windows, hindering their utility in tasks like extended conversa…  ( 9 min )
    [D] Some beginner questions about Whisper for transcription
    Hi, I am a mac user. I am trying to use whisper.cpp downloaded from its github file. I don't know much about phyton or coding so I basically followed this guide to install and use it. I downloaded the large model to try it. I am using it for non-English languages and I want to use it for language learning purposes so I can understand what is being said in an Instagram story or a Youtube video (without subtitles) or a tv series or an extract of movie etc. I was using Macwhisper but I wanted to try the pro features and I don't want to pay for it (for now) and try the pro models for non-English languages. My question is: all of my files that I want to transcribe are video files with .mp4 extension. Can I also transcribe those with whisper? If not, and if I can only transcribe audio files, can it be .mp3? I understand that I need to install and use ffmpeg. Does it support mp3? Also, as I understand, the transcripted text will appear in the terminal. Can I export it in -srt or pdf? Thanks submitted by /u/toughytough [link] [comments]  ( 9 min )
    [D] Transformers are basically CNNs?
    I've watched an interesting video: Deriving the Ultimate Neural Network Architecture from Scratch. It's about how to come up to the transformer architecture when you have an understanding of CNNs. The crux of it is an idea of pairwise convolutional layers. The first layer applies not to the sequence of words itself, but to all pairs of words in the sentence. This ensures that each relation of words that are far from each other is taken into account. The next convolutional layer applies to all pairs of results of the previous one. This way longer subsequences of words are factored in. pairs of words My question is: are there any articles on how transformers were invented? I see a lot of explanations of the original paper, but at best they all answer the question how transformers work. But why is the architecture the way it is? Was it discovered like the video describes? Or the path was more convoluted? I'd like to know more about this connection. Anyway, it would be great to figure out in all details how these pairwise layers are related to the concepts of query, key, and value. Here's what the author of the video wrote in comments: Yeah it's a term I made up so you won't find it in any sources, sorry about that. Usually sources will just talk about self attention in terms of key, query and value lookups, so you can look at those to get a more detailed understanding of the transformer. The value transform is equivalent to the linear representation function I use in the pairwise convolution, the key and query attention scores are equivalent to the bi-linear form scoring function I use (with the bi-linear form weight matrix given by Q^TK). I chose to use this unusual terminology because, personally, I feel the key, query and value terminology comes out of nowhere, and I wanted to connect the transformer more directly to its predecessor (the CNN). ​ submitted by /u/Veson [link] [comments]  ( 10 min )
    [R] Does this learning curve show any serious under/overfitting problems?
    I'm trying to fit a multivariate LSTM model to time series data to predict future values for one relatively noisy series. I noticed that the the loss (mse in this case) is pretty high given that the data has been standardized beforehand. So I really have two questions: why is the mse so high and is the learning curve indicative of any obvious problems? Thank you! https://preview.redd.it/r9bel6p7kfvb1.png?width=547&format=png&auto=webp&s=4eee53aa8005da8a89f330f6e98fe6cadde3467e submitted by /u/DifferenceUnhappy393 [link] [comments]  ( 9 min )
    [Discussion] Is the deadly triad real?
    Sutton and Barto’s textbook mentions that combing off-policy learning, bootstrapping, and function approximation leads to extreme instability and should be avoided. Yet when I encounter a reinforcement problem in the wild and look how people go about solving it, if someone’s solution involves bootstrapping more often than not it’s some variation of deep Q-learning. Why is this? submitted by /u/BiasedEstimators [link] [comments]
    [P] building a D&D NPC
    Hey everyone, I'm learning ML but i'm barely scratching the terminologies. 2 years ago I couldn't code anything but with school (python,sql and R) I learned fundamentals. I also have access to code academy. My current program is very machine learning/deep learning focused. On the side I DM a d&d game. Within the context of the world (eberron) robots are common. With my ADHD and being a new DM I want to outsource lore questions might have (that I would have to look up and slow down the game). The concept is to have a GUI and have the player interact with the chat bot. I've gotten to a proof of concept workflow. On Google colab. Thanks to langchain I managed to ingest pdfs and a url. Make then a directory, Embedded the text, bring it into a vector dB. Have the llm pull from the vector. Answer the question. Now I don't know what to do. I tried to bring the colab notebook onto Google cloud. But now cloud is becoming a rabbit home with vertex and docAI...and I don't want to deep dive into that, if it's a outside the scope of this "project" I'd appreciate any advice, links...etc. I got a limited success in botpress using a single pdf. It works but feel unsatisfying. N8N looks promising but if it's not intuitive then I don't want to go down that road. If I posted in the wrong group please direct me to the correct one. submitted by /u/work929 [link] [comments]  ( 9 min )
    [R] In-Context Pretraining: Language Modeling Beyond Document Boundaries
    https://arxiv.org/abs/2310.10638 "Large language models (LMs) are currently trained to predict tokens given document prefixes, enabling them to directly perform long-form generation and prompting-style tasks which can be reduced to document completion. Existing pretraining pipelines train LMs by concatenating random sets of short documents to create input contexts but the prior documents provide no signal for predicting the next document. We instead present In-Context Pretraining, a new approach where language models are pretrained on a sequence of related documents, thereby explicitly encouraging them to read and reason across document boundaries. We can do In-Context Pretraining by simply changing the document ordering so that each context contains related documents, and directly applying existing pretraining pipelines. However, this document sorting problem is challenging. There are billions of documents and we would like the sort to maximize contextual similarity for every document without repeating any data. To do this, we introduce approximate algorithms for finding related documents with efficient nearest neighbor search and constructing coherent input contexts with a graph traversal algorithm. Our experiments show In-Context Pretraining offers a simple and scalable approach to significantly enhance LMs'performance: we see notable improvements in tasks that require more complex contextual reasoning, including in-context learning (+8%), reading comprehension (+15%), faithfulness to previous contexts (+16%), long-context reasoning (+5%), and retrieval augmentation (+9%)." submitted by /u/Parking-Priority6217 [link] [comments]  ( 9 min )
    [R] Using Machine Learning to set parameters in sensors (College Project)
    Greetings, I'm on my 2nd year of College (Artificial Intelligence bachelors degree), and currently making a group project that will require machine learning. The project consists of managing and regulating the conditions (temperature, humidity, lightning, etc.) of the environment that surrounds important products (vaccines, human organs, etc.) during their transportation, using sensors implemented in their transportation box. For that being possible, our group was planning to use a predictive model using machine learning, to prevent cases such as the exposure of inappropriate temperature levels, that could damage the product, and subsequently taking the appropriate measures to improve the environment, before it reaches such dangerous scenarios. Therefore, I would like to know which tools and skills will be needed and helpful in order to achieve such goal. If you have any advice, that'll be very much appreciated. :) submitted by /u/Storm2003 [link] [comments]  ( 9 min )
    [R] 3D-GPT: A new method for procedural Text-to-3D model generation
    Researchers propose a new AI system called 3D-GPT that creates 3D models by combining natural language instructions and agents specialized for working with existing 3D modeling tools. 3D-GPT has predefined functions that make 3D shapes, and it tweaks parameters to build scenes. The key is getting the AI to understand instructions and pick the right tools. It has three main agents: A dispatcher that parses the text and picks generation functions A conceptualizer that adds details missing from the description A modeler that sets parameters and outputs code to drive 3D software By breaking modeling work down into steps, the agents can collab to match the descriptions. This is sort of like how a 3D modeling team of humans would work. The paper authors show it making simple scenes like "lush meadow with flowers" that fit the text. It also modifies scenes appropriately when given new instructions. I include some gifs of example outputs in my full summary. They look pretty good - I would say 2005-quality graphics. There are limits. It fully relies on existing generators, so quality is capped. Details and curves are iffy. It resorts to default shapes often instead of true understanding. And I doubt the verts and textures are well-optimized. The agent architecture seems to be really popular right now. This one shows some planning skills, which could extend to more creative tasks someday. TLDR: AI agents can team up to generate 3D models from text instructions. Works to some degree but limitations remain. Full summary. Paper here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [R] Bayesian Optimization-based Combinatorial Assignment
    Link: https://ojs.aaai.org/index.php/AAAI/article/view/25726/25498 Abstract: We study the combinatorial assignment domain, which includes combinatorial auctions and course allocation. The main challenge in this domain is that the bundle space grows exponentially in the number of items. To address this, several papers have recently proposed machine learning-based preference elicitation algorithms that aim to elicit only the most important information from agents. However, the main shortcoming of this prior work is that it does not model a mechanism's uncertainty over values for not yet elicited bundles. In this paper, we address this shortcoming by presenting a Bayesian optimization-based combinatorial assignment (BOCA) mechanism. Our key technical contribution is to integrate a method for capturing model uncertainty into an iterative combinatorial auction mechanism. Concretely, we design a new method for estimating an upper uncertainty bound that can be used to define an acquisition function to determine the next query to the agents. This enables the mechanism to properly explore (and not just exploit) the bundle space during its preference elicitation phase. We run computational experiments in several spectrum auction domains to evaluate BOCA's performance. Our results show that BOCA achieves higher allocative efficiency than state-of-the-art approaches. https://preview.redd.it/aeo36u3wldvb1.png?width=1288&format=png&auto=webp&s=2982547f8af51ed7195f49dbec9359fecba1693f ​ submitted by /u/Yossarian_1234 [link] [comments]  ( 9 min )
    [D] What is the latest method for models with multimodal outputs? How can the shared embedding used by a lot of multimodal models be dynamically "routed" to the proper modality during output?
    So a lot of multimodal models I've seen use a linear layer to transform encoded image/video/audio into the multimodal LLMs embedding space. This makes sense for the input, but how would output work? Normally you use a layer to convert the embedding to a SoftMax of probabilities of possible output tokens. This makes sense for discrete outputs like tokens but not for continuous outputs like images or audio. ​ submitted by /u/30299578815310 [link] [comments]  ( 9 min )
    [D] Is anyone else tired of “whatever OpenAI does is the best!” narrative?
    The title says it all. I agree what they did is incredible and literally changed AI landscape in last couple of years. But I’m getting tired of everyone acting like OpenAI is the only one doing great research. The twit-fluencers praising even the slightest peep from them. I don’t understand this fanaticism in AI community. There are smart researchers doing smart things all over the world. But they don’t even get a fraction of appreciation they deserve. And the strangest thing of all, ChatGPT is used as oracle to evaluate models in research papers. Consistency models are extremely meh and if it did not come out of openAI, people would’ve forgotten them a long time ago! Edit 1: I’m in grad school and that’s all a lot of students around me talk about/ chase. I want to work on a bit more fundamental problems, but I feel like I’m being left behind. Edit 2: This post is mostly a rant about academics obsessed with OpenAI research/products and LLMs. submitted by /u/mildlyphd [link] [comments]  ( 9 min )
    [P] Hacktoberfest Machine Learning Projects for JS/TS Developers 🎃
    Hey everyone,we have published an article about Hacktoberfest Projects 🎃 medium.com with a curated list of open-source machine learning GUI projects built with javascript or typescript. ​ https://preview.redd.it/nr4jfbqoscvb1.png?width=1352&format=png&auto=webp&s=fbb2313aabf0a617b6e426f1fa5018946b7ed7f5 🔍 Finding machine learning projects that are suitable for JS/TS developers during Hacktoberfest can be daunting due to the overwhelming abundance of open-source projects. We’ve simplified this process, offering you a refined selection of opportunities where your coding skills can shine and make a real impact. The Selection includes: Spotlight our powerful tool for intuitively exploring unstructured datasets directly from dataframes. Iteratives CML (Continuous Machine Learning) a command-line interface tool designed to enhance continuous integration and delivery (CI/CD) workflows. Inclusive Code Reviews: Browser Extension for improving online comments such as code reviews on Github or Azure DevOps. BeatBridge - A Music Player with a Recommendation Engine Each project offers a unique blend of challenges and learning opportunities, inviting you to contribute and grow your skills and knowledge in the dynamic world of open source. Choose a project that resonates with you, select an issue, and make an impact 🚀. submitted by /u/DocBrownMS [link] [comments]  ( 9 min )
    [R] AgentTuning: Enabling Generalized Agent Abilities for LLMs - Tsinghua University 2023 - Agent-tuned open model comparable to GPT-3.5-Turbo on unseen agent tasks!
    Paper: https://arxiv.org/abs/2310.12823 Github: https://github.com/THUDM/AgentTuning Model: https://huggingface.co/THUDM/agentlm-70b Abstract: Open large language models (LLMs) with great performance in various tasks have significantly advanced the development of LLMs. However, they are far inferior to commercial models such as ChatGPT and GPT-4 when acting as agents to tackle complex tasks in the real world. These agent tasks employ LLMs as the central controller responsible for planning, memorization, and tool utilization, necessitating both fine-grained prompting methods and robust LLMs to achieve satisfactory performance. Though many prompting methods have been proposed to complete particular agent tasks, there is lack of research focusing on improving the agent capabilities of L…  ( 9 min )
    [D] People working for (relatively) large organisations. How are LLMs accessed by employees within your organisation right now?
    I'm wondering whether LLMs within your organisation are widely used (including non-programmers), and in an (official) capacity that prevents OpenAI/Microsoft or another third party from using the input. Here, I'm talking about access by a wide variety of employees, not including as part of a data pipeline that doesn't have a user interface and only performs one job. Does your organization have a custom-built interface with enterprise access to an LLM? Use one of the open-source interfaces, or does your organisation provide access through i.e. Microsoft copilot? What about access to Github copilot (for programmers)? Or does your organisation have some kind of SAAS solution? If you have some kind of RAG within the organisation that isn't built-in into a product. What sort of stack do you use? Do you use OpenAI plugins to access this? submitted by /u/Background_Claim7907 [link] [comments]  ( 9 min )
    [D] Thoughts on Open-Domain QnA Systems?
    Been really interested in Open-Domain Question Answering these days and saw some interesting new models apart from the typical Retriever-Reader e.g. Generator-Retriever-Generator. Anyone particularly excited about anything new in the field - some new technique/model etc.? submitted by /u/Aggravating-Floor-38 [link] [comments]  ( 9 min )
    [N] State of AI Report 2023
    The State of AI Report for this year is out : https://www.stateof.ai/2023-report-launch A 160-slide presentation/report which seems quite exhaustive in the discussed topics, and provides a good view of the "hottest" research axes this year. Previous reports (yearly since 2019) are available on their website and have been generally well received in this sub. submitted by /u/ElkoSoltius [link] [comments]  ( 9 min )
    [R] Large Language Models as Analogical Reasoners
    https://arxiv.org/abs/2310.01714 "Chain-of-thought (CoT) prompting for language models demonstrates impressive performance across reasoning tasks, but typically needs labeled exemplars of the reasoning process. In this work, we introduce a new prompting approach, Analogical Prompting, designed to automatically guide the reasoning process of large language models. Inspired by analogical reasoning, a cognitive process in which humans draw from relevant past experiences to tackle new problems, our approach prompts language models to self-generate relevant exemplars or knowledge in the context, before proceeding to solve the given problem. This method presents several advantages: it obviates the need for labeling or retrieving exemplars, offering generality and convenience; it can also tailor the generated exemplars and knowledge to each problem, offering adaptability. Experimental results show that our approach outperforms 0-shot CoT and manual few-shot CoT in a variety of reasoning tasks, including math problem solving in GSM8K and MATH, code generation in Codeforces, and other reasoning tasks in BIG-Bench." https://preview.redd.it/f9azq40pwavb1.jpg?width=6390&format=pjpg&auto=webp&s=0af3de7925a6ef8f442e40f952849db2f544c3a7 submitted by /u/Parking-Priority6217 [link] [comments]
    [R] Large Language Models as Optimizers
    https://arxiv.org/abs/2309.03409 "Optimization is ubiquitous. While derivative-based algorithms have been powerful tools for various problems, the absence of gradient imposes challenges on many real-world applications. In this work, we propose Optimization by PROmpting (OPRO), a simple and effective approach to leverage large language models (LLMs) as optimizers, where the optimization task is described in natural language. In each optimization step, the LLM generates new solutions from the prompt that contains previously generated solutions with their values, then the new solutions are evaluated and added to the prompt for the next optimization step. We first showcase OPRO on linear regression and traveling salesman problems, then move on to prompt optimization where the goal is to find instructions that maximize the task accuracy. With a variety of LLMs, we demonstrate that the best prompts optimized by OPRO outperform human-designed prompts by up to 8% on GSM8K, and by up to 50% on Big-Bench Hard tasks." submitted by /u/Parking-Priority6217 [link] [comments]  ( 9 min )
    [D] Communities thoughts on r/singularity and other non-technical machine learning subreddits?
    I’ve seen many comments telling people to go to r/singularity, so I’ve been wondering about the communities thoughts on non-technical subreddits. Are they seen as a source of hype, getting newcomers more interested in the field and helping to advance knowledge? Or do you see such communities as an overly optimistic non-skeptical massive misinformation/active disinformation center? Do you think there’s something that can be done to improve these communities? What do you think their role should be relative to the technical communities? Do you have any specific criticisms? For those of you who think our two communities should be separate to what extent? submitted by /u/Username912773 [link] [comments]  ( 9 min )
    [R] Neural Relation Graph: A Unified Framework for Identifying Label Noise and Outlier Data (NeurIPS 2023)
    paper: https://arxiv.org/abs/2301.12321 code: https://github.com/snu-mllab/Neural-Relation-Graph TDLR: We present a scalable and domain-agnostic approach utilizing the relational structure of data for identifying label noise and outliers https://preview.redd.it/o9k7kliqe9vb1.png?width=3108&format=png&auto=webp&s=b7c34bd7f4bc130915440986570104f9bebd4f07 Diagnosing and cleaning data is a crucial step for building robust machine learning systems. However, identifying problems within large-scale datasets with real-world distributions is challenging due to the presence of complex issues such as label errors, under-representation, and outliers. In this paper, we propose a unified approach for identifying the problematic data by utilizing a largely ignored source of information: a relational structure of data in the feature-embedded space. To this end, we present scalable and effective algorithms for detecting label errors and outlier data based on the relational graph structure of data. We further introduce a visualization tool that provides contextual information of a data point in the feature-embedded space, serving as an effective tool for interactively diagnosing data. We evaluate the label error and outlier/out-of-distribution (OOD) detection performances of our approach on the large-scale image, speech, and language domain tasks, including ImageNet, ESC-50, and SST2. Our approach achieves state-of-the-art detection performance on all tasks considered and demonstrates its effectiveness in debugging large-scale real-world datasets across various domains. ​ Detected samples with label error (red colored) from ImageNet (top) and SST2 (bottom). ​ Detected outlier samples from ImageNet (top) and SST2 (bottom) validation sets. submitted by /u/janghyun1230 [link] [comments]  ( 9 min )
    [D] Future AI development on accessible hardware?
    Is there a future where models can run efficiently and at scale with just half a dozen high end consumer GPUs? A lot of people seem to think the bottleneck is "there's no competition for NVIDIA" but I actually think the current bottleneck is software. 4x 4090s is more CUDA cores, more transistors, more VRAM than a H100, but the performance and price difference is staggering, which should not be the case. Raspberry Pi 4s running faster desktops than a same generation Dell Inspiron prove that software integration is key. Cheap performance is laying on the table, it just has to be used more effectively by models and ML libraries submitted by /u/HovercraftForeign591 [link] [comments]  ( 9 min )
    [D] Online masters alternatives for MLOps
    Hi everyone, greeting from south america.. Basically I'm looking for an program to learn and improve my job opportunities in the MLOps field and at some point getting higher responsability positions. I recently got admitted for both OMSA and OMSCS from Gatech, but I feel those programs are more focused on the data science side of things. Is there any other alternative without GRE requeriment that you would recommend with a similar cost? Maybe I'm wrong about the aforementioned programs, if you think so, please let me know why. Thanks! submitted by /u/imatiasmb [link] [comments]  ( 9 min )
  • Open

    Sell Like Crazy with This One ChatGPT Prompt
    submitted by /u/Senior_tasteey [link] [comments]
    Amazon Tests Humanoid Robots in Warehouses
    submitted by /u/Master-Strawberry-26 [link] [comments]
    Reddit is considering a soft paywall if AI companies don't pay up
    Reddit is considering implementing a soft paywall on its content if generative AI companies do not agree to pay for using its data. This move comes as tensions rise between tech giants and content publishers over the financial stakes in the generative AI market. Reddit believes that its vast range of user-generated text makes it a goldmine for AI training data, but critics argue that much of the content is copied from other sources or links to third-party resources. Enforcing a soft paywall could provide leverage in negotiations with AI companies, but it may also alienate the Reddit community and impede the discovery of new content. Major newspapers like The New York Times and The Washington Post have also blocked AI companies from scraping their websites for training data. Enforcing a soft paywall is a double-edged sword for Reddit, as it could provide leverage in negotiations but also alienate the community and impede content discovery. Reddit's broken search engine is a major concern, and implementing a paywall could result in a significant loss of search traffic. If Reddit and other content giants implement paywalls, it could impact how generative AI models are trained and lead to increased expenses and a slower rate of innovation. This move by Reddit may pave the way for more publishers and platforms to implement paywalls, potentially reshuffling the industry. Source : https://stackdiary.com/reddit-thinks-its-data-is-worth-enforcing-a-log-in-page/ submitted by /u/NuseAI [link] [comments]
    Researchers propose 3D-GPT: combining LLMs and agents for procedural Text-to-3D model generation
    Researchers propose a new AI system called 3D-GPT that creates 3D models by combining natural language instructions and agents specialized for working with existing 3D modeling tools. 3D-GPT has predefined functions that make 3D shapes, and it tweaks parameters to build scenes. The key is getting the AI to understand instructions and pick the right tools. It has three main agents: A dispatcher that parses the text and picks generation functions A conceptualizer that adds details missing from the description A modeler that sets parameters and outputs code to drive 3D software By breaking modeling work down into steps, the agents can collab to match the descriptions. This is sort of like how a 3D modeling team of humans would work. The paper authors show it making simple scenes like "lush meadow with flowers" that fit the text. It also modifies scenes appropriately when given new instructions. I include some gifs of example outputs in my full summary. They look pretty good - I would say 2005-quality graphics. There are limits. It fully relies on existing generators, so quality is capped. Details and curves are iffy. It resorts to default shapes often instead of true understanding. And I doubt the verts and textures are well-optimized. The agent architecture seems to be really popular right now. This one shows some planning skills, which could extend to more creative tasks someday. TLDR: AI agents can team up to generate 3D models from text instructions. Works to some degree but limitations remain. Full summary. Paper here. submitted by /u/Successful-Western27 [link] [comments]
    AI — weekly megathread!
    News provided by aibrews.com ​ Adept open-sources Fuyu-8B - a multimodal model designed from the ground up for digital agents, so it can support arbitrary image resolutions, answer questions about graphs and diagrams, answer UI-based questions and more. It has a much simpler architecture and training procedure than other multi-modal models- there is no image encoder [Details]. Meta AI researchers present an AI system that can be deployed in real time to reconstruct, from brain activity, the images perceived and processed by the brain at each instant. It uses magnetoencephalography (MEG), a non-invasive neuroimaging technique in which thousands of brain activity measurements are taken per second [Details]. Scaled Foundations released GRID (General Robot Intelligence Development) - a p…
    People are grieving the 'death' of their AI companions after a chatbot app abruptly shut down
    submitted by /u/thisisinsider [link] [comments]
    Mind-blowing' IBM chip speeds up AI
    Researchers at IBM have developed a brain-inspired computer chip called NorthPole that can supercharge artificial intelligence (AI) by working faster with much less power. The chip eliminates the need to frequently access external memory, allowing it to perform tasks such as image recognition faster and consume less power. NorthPole runs neural networks and is made up of 256 computing units, each with its own memory. It beats existing AI machines in benchmark tests and uses one-fifth of the energy of state-of-the-art AI chips. However, it is not suitable for large language models and can only run pre-programmed neural networks. Source : https://www.nature.com/articles/d41586-023-03267-0 submitted by /u/NuseAI [link] [comments]
    Photograph of puddles reflecting the sky on a cobbled street.
    submitted by /u/IllustriousVideo6145 [link] [comments]
    One-Minute Daily AI News 10/20/2023
    In a fascinating development, a software engineer named Peter Whidden has trained an artificial intelligence (AI) algorithm to play the classic Pokémon games. Over the course of several years, the AI has spent over 50,000 hours playing the game and has amassed a large following on YouTube.[1] YouTube is developing a tool powered by artificial intelligence that would let creators record audio using the voices of famous musicians, according to people familiar with the matter.[2] Google taps gen-AI to help users in India search through government welfare schemes.[3] Huawei is rolling out a new HarmonyOS 4.0.0.126 software update for the Huawei Mate 60 Pro, which brings a new AI Cloud Image Enhancement feature and other important enhancements to the system.[4] Sources: [1] https://gameishard.gg/news/can-artificial-intelligence-play-pokemon/400727/ [2] https://www.bloomberg.com/news/articles/2023-10-19/youtube-working-on-tool-that-would-let-creators-sing-like-drake?embedded-checkout=true [3] https://news.yahoo.com/google-taps-gen-ai-help-063850226.html [4] https://www.huaweicentral.com/huawei-mate-60-pro-gets-a-cloud-image-enhancement-feature-google-pixel-8-pro-lags-behind/ submitted by /u/Excellent-Target-847 [link] [comments]
    Live Introduction to Core Machine Learning Concepts Course (Sailea)
    >Sailea is a student run non-profit that does not charge for any of its services Join the FIRST lesson of SAILea’s course on the Principals of AI! 🌳 Covers: Unsupervised, Supervised, and Reinforcement Learning; Overfitting, Underfiting, Confusion Matrix; Decision Trees 🗓️ October 21st ⏰ 7:00-8:00PM EST Why Sailea? Only course targeted at high schoolers Free Forever Join Us Now! 👉 (signup form) https://docs.google.com/forms/d/e/1FAIpQLSfQGCeZClTdF6zeIQ-RtbOGR582bb1slc3oR0zG2J7j1v5RHg/viewform?usp=sf_link 🌳 Register today, get involved in the community and grow your knowledge! submitted by /u/Envoy-Insc [link] [comments]
  • Open

    DQN with a binary vector as output
    Heey everyone! I hope you're doing well. I need your help guys. I'm working on a DQN that outputs a binary vector of length L (I just applied sigmoid function on the ouptut layer and take p>0.5 as 1 and 0 otherwise). In this setting, how can modify the below code to update my DQN: def update(self): states, actions, rewards, next_states, dones = self.memory.sample(self.batch_size) states = torch.FloatTensor(np.array(states)) actions = torch.LongTensor(np.array(actions)) rewards = torch.FloatTensor(np.array(rewards)) next_states = torch.FloatTensor(np.array(next_states)) dones = torch.FloatTensor(np.array(dones)) q_values = self.model(states) q_values = q_values.gather(1, actions.unsqueeze(1)) next_q_values = self.target_model(next_states).detach() expected_q_values = rewards + self.gamma * (1 - dones) * next_q_values.max(1)[0] expected_q_values = expected_q_values.unsqueeze(1) loss = nn.BCELoss(q_values, expected_q_values) self.optimizer.zero_grad() loss.backward() self.optimizer.step() submitted by /u/GuavaAgreeable208 [link] [comments]
    Is the “Deadly Triad” even real?
    Sutton and Barto’s textbook mentions that combing off-policy learning, bootstrapping, and function approximation leads to extreme instability and should be avoided. Yet when I encounter a reinforcement problem in the wild and look how people go about solving it, if someone’s solution involves bootstrapping more often than not it’s some variation of deep Q-learning. Why is this? submitted by /u/BiasedEstimators [link] [comments]
    Dead simple explanations of popular RL concepts (open source)
    Hey everyone! I just started an open-source repo for RL explanations. https://github.com/DenseLayers/densewiki Many people, especially beginners struggle to develop the intuition around concepts (like actor-critic vs advantage actor-critic, GAE, PPO, etc). Often it's nice to see what's happening at a high level first, before we dive deeper into the math. That's what I'm trying to do here. But I can't do it alone, so I'm posting here to get help from others in the community to make sure the explanations are clear, extremely approachable, and accurate. If you'd like to work with me on this (whether you're a complete beginner or very knowledgeable), please reach out! ​ submitted by /u/mngrwl [link] [comments]
    Reinforement learning on the game "Quarto"
    hello, i am working on solving this board game called "Quarto" where we have 16 different pieces. but these pieces have attributes in common they black or white, short or tall, hollow top or closed top, and square shaped or circle shaped pieces each piece has four attributes. the winning condition is to place 4 pieces consecutively in a 4X4 board with at least one attribute in common to win. and also we hadve to choose the piece for the opponent to make and then opponent places that piece and gives us a piece to move. so there are two actions. i have made the action space as 256 + 16 where 256=16*16 as all pieces can be place anywhere on the board and the last 16 is the last possible move that is the move which leads to a terminating state so the next_piece for the opponent would be blank …
    What is the optimal way to train a PPO?
    Hello! I've got a really simple question, i'm training a PPO algorithm and I wanna know what is the best way to train my model? Sorry, I'll try to be clear! So right now what i'm doing is : I'm loading a previously trained PPO model Train the model on 20000 timesteps Evaluate the reward of the newly trained PPO model at the end of the timesteps and compare it to the reward from the model loaded in 1 If the reward is greater then i'm going back to step 1 and using the new model If not then i'm going back to step 1. Is it a correct way to do so? Thanks a lot and have a great day! submitted by /u/PointNo1904 [link] [comments]
    new chess dataset: 3.2b games (608b moves) generated by 2500-ELO Stockfish selfplay {LAION}
    submitted by /u/gwern [link] [comments]
  • Open

    Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker
    Customers of every size and industry are innovating on AWS by infusing machine learning (ML) into their products and services. Recent developments in generative AI models have further sped up the need of ML adoption across industries. However, implementing security, data privacy, and governance controls are still key challenges faced by customers when implementing ML […]  ( 16 min )
    How Meesho built a generalized feed ranker using Amazon SageMaker inference
    This is a guest post co-written by Rama Badrinath, Divay Jindal and Utkarsh Agrawal at Meesho. Meesho is India’s fastest growing ecommerce company with a mission to democratize internet commerce for everyone and make it accessible to the next billion users of India. Meesho was founded in 2015 and today focuses on buyers and sellers […]  ( 6 min )
  • Open

    Answering billions of reporting queries each day with low latency
    Posted by Jagan Sankaranarayanan, Senior Staff Software Engineer, and Indrajit Roy, Head of Napa Product, Google Google Ads infrastructure runs on an internal data warehouse called Napa. Billions of reporting queries, which power critical dashboards used by advertising clients to measure campaign performance, run on tables stored in Napa. These tables contain records of ads performance that are keyed using particular customers and the campaign identifiers with which they are associated. Keys are tokens that are used both to associate an ads record with a particular client and campaign (e.g., customer_id, campaign_id) and for efficient retrieval. A record contains dozens of keys, so clients use reporting queries to specify keys needed to filter the data to understand ads performance (e.…  ( 93 min )
  • Open

    For the World to See: Nonprofit Deploys GPU-Powered Simulators to Train Providers in Sight-Saving Surgery
    GPU-powered surgical-simulation devices are helping train more than 2,000 doctors a year in lower-income countries to treat cataract blindness, the world’s leading cause of blindness, thanks to the nonprofit HelpMeSee. While cataract surgery has a success rate of around 99%, many patients in low- and middle-income countries lack access to the common procedure due to Read article >  ( 6 min )
    Eureka! NVIDIA Research Breakthrough Puts New Spin on Robot Learning
    A new AI agent developed by NVIDIA Research that can teach robots complex skills has trained a robotic hand to perform rapid pen-spinning tricks — for the first time as well as a human can. The stunning prestidigitation, showcased in the video above, is one of nearly 30 tasks that robots have learned to expertly Read article >  ( 6 min )
  • Open

    Article: Computer Vision in Agriculture. Challenges & Solutions.
    ​ https://preview.redd.it/j3nmj31llcvb1.jpg?width=2500&format=pjpg&auto=webp&s=c09804179e4f40a854e1327fa9150f1ab0c0dfd0 Interesting article about use cases of data augmentation in agricultural industry. Short description: In this article, you will cover: • How computer vision solutions are transforming the agricultural industry. • Observe the importance of quality data for developing AI solutions that perform crop and livestock analysis and monitoring with high and steady accuracy. • Explore the use of synthetic data to facilitate data collection in various conditions. • Take a look at examples of tasks in agriculture. How can we solve them with computer vision, and how can we apply synthetic data to extend the augmentation? More details are here submitted by /u/No-Independence5880 [link] [comments]
  • Open

    The 19th rule of HIPAA Safe Harbor
    The HIPAA Safe Harbor provision says that data can be considered deidentified if 18 kinds of data are removed or reported at low resolution. At the end of the list of 18 items, there is an extra category, sometimes informally called the 19th rule: The covered entity does not have actual knowledge that the information […] The 19th rule of HIPAA Safe Harbor first appeared on John D. Cook.  ( 5 min )

  • Open

    How Many Businesses Use AI?
    submitted by /u/Senior_tasteey [link] [comments]
    Is the Roko Basilisk Thought Experiment Forbidden To Talk About?
    I was reading this article on Roko's basilisk and it reminded me of the long debates I had about it 10 years ago. The idea of a sentient AI keeping a grudge against those who didn't help in its creation, and condemning them is fascinating. And I don't quite understand why LessWrong stopped Basilisk. What if we are already in the Basilisk's simulation? WHat if LessWrong never pulled the plug? submitted by /u/fookingyeah [link] [comments]
    Conversing with Vulnerabilities: AI-Assisted CVE Search
    submitted by /u/Zimmax [link] [comments]
    YouTube wants to launch an AI-powered tool that lets you sound like your favorite singer, report says
    submitted by /u/thisisinsider [link] [comments]
    College Student looking for advice
    I'm a sophomore at a small college, and I'm coming up on scheduling for the classes that are about to start actually mattering, and I need some advice. I'm highly interested in both robotics and AI, but I'm not sure what to major in (likely double major). I know CS is a common tie between the two fields, but I'm not sure what additional major to include. I can choose either data science or physics. I could also technically include ME but I'm much less inclined to do so. Any advice is appreciated! submitted by /u/Inferno980 [link] [comments]
    Thoughts on a global compute cap for potential AGI projects?
    There's been a bunch of discourse in the run up to the November AI Safety Summit in the UK about what safety policies should be in place. ARC Evals & Anthropic are pushing for 'Responsible Scaling', which doesn't put any hard upper limits on the about of compute that powerful models can use. There are others who think we need a global compute cap. Thoughts enforcing a ceiling for the amount of compute/FLOP that both state & non-state actors can use? submitted by /u/Seamus127 [link] [comments]
    Artificial Revolution | AI Technology and its effects on the Labour Market.
    submitted by /u/senploxart [link] [comments]
    EU Elections at Risk with Rise of AI-Enabled Information Manipulation
    The 11th edition of the Threat Landscape report by the European Union Agency for Cybersecurity (ENISA) highlights the risks posed by AI-enabled information manipulation in the upcoming EU elections. The report recorded approximately 2580 incidents during the reporting period, with 220 incidents specifically targeting two or more EU Member States. The sectors mostly targeted include public administrations (19%) and health (8%), with a cascading effect observed due to interdependencies. Information manipulation campaigns are considered a major threat to election processes, with individuals (47%) and public administration (29%) being the primary targets. The report also provides an overview of evolving trends in threat actors, including state-nexus actors targeting key individuals through spear phishing and social networks. Ransomware and DDoS attacks remain the top threats, accounting for 34% and 28% of all threats, respectively. The motivations behind these threats include financial gain, disruption, espionage, destruction, and ideology. The report highlights the potential misuse of artificial intelligence-powered chatbots in phishing attempts, information manipulation, and cybercrime. Older techniques like search engine optimization (SEO) poisoning and malvertising have also seen a resurgence among cybercrime actors. The report concludes by emphasizing the importance of addressing vulnerabilities and ensuring cybersecure infrastructures for the integrity and availability of information in the EU electoral process. Source : https://www.enisa.europa.eu/news/eu-elections-at-risk-with-rise-of-ai-enabled-information-manipulation submitted by /u/NuseAI [link] [comments]
    Is chatgpt,Bard,Poe,Bing ai chatbot ai or research and Analysis ai?
    Tia submitted by /u/Emad_341 [link] [comments]
    One-Minute Daily AI News 10/19/2023
    NVIDIA has announced that its open-source TensorRT-LLM library, formerly limited to data center usage, is now accessible for Windows personal computers.[1] Microsoft just shipped Azure AI Content Safety to general availability. It’s an AI-powered platform designed to “help organizations create safer online environments.”[2] Mozilla Brings a Fake Review Checker AI Tool to Firefox.[3] Nvidia and iPhone maker Foxconn to build ‘AI factories’.[4] Sources: [1] https://winbuzzer.com/2023/10/18/nvidia-unveils-tensorrt-llm-tool-to-boost-ai-language-model-performance-on-windows-pcs-xcxwbn/ [2] https://www.windowscentral.com/software-apps/microsoft-wants-to-make-ai-safer-and-it-just-unveiled-a-service-to-help [3] https://www.marktechpost.com/2023/10/17/mozilla-brings-a-fake-review-checker-ai-tool-to-firefox/ [4] https://www.bbc.com/news/business-67153669 submitted by /u/Excellent-Target-847 [link] [comments]
    Is chatgpt,Bard,Poe,Bing ai chatbot ai or research and Analysis ai?
    Thank you submitted by /u/Emad_341 [link] [comments]
    Danny Davinci
    submitted by /u/chuck-yeah [link] [comments]
    OpenAI Kills Arrakis
    submitted by /u/Agitated-Spell3979 [link] [comments]
    The insane AI power of DALL-E 3
    submitted by /u/the_anonymizer [link] [comments]
    AI Is Booming. This Is How CEOs Are Using It
    AI is having a significant impact on the direction of products for CEOs, who are committing talent and resources to building AI capabilities. Incumbent platforms like OpenAI and AWS are dominating the AI market. Coding co-pilots like GitHub Co-Pilot are widely adopted. The adoption of AI tools, including coding co-pilots, is not leading to a reduction in engineering headcount for most CEOs. However, some CEOs have reported that co-pilots have reduced their future hiring needs. The landscape of AI tools is expected to continue shifting, with more second order effects and value-add use cases emerging. Source : https://www.flexcapital.com/post/ai-is-booming-this-is-how-ceos-are-actually-using-it submitted by /u/NuseAI [link] [comments]  ( 9 min )
  • Open

    machine learning on a microcontroller [P]
    i am making an EEG machine for a university project, i will be taking in an analogue signal and converting it to digital, i then will be sending the varying voltages to a microcontroller in hopes that it will be able to catagorise them in either states of mind or as simply as telling whether or not the persons eyes are open or closed. i have very little knowledge on machine learning but it is required to be implemented in the project, my lecturer is pressuring me to have final pick of what software and microcontroller iw will be using for this project, everyone else in the class are using Edge Impulse which the lecturer said wouldn't be applicable to me as it uses accelerometers and voice. and are using CY8CKIT-042 PSoC 4 PIONEER KITS which apperently arent suited for me either. any help would be much appreciated and i do apologise if this is too rambly. submitted by /u/disslixac [link] [comments]  ( 9 min )
    [R] OpenAgents: An Open Platform for Language Agents in the Wild - The University of Hong Kong 2023
    Paper: https://arxiv.org/abs/2310.10634v1 Github: https://github.com/xlang-ai/OpenAgents Abstract: Language agents show potential in being capable of utilizing natural language for varied and intricate tasks in diverse environments, particularly when built upon large language models (LLMs). Current language agent frameworks aim to facilitate the construction of proof-of-concept language agents while neglecting the non-expert user access to agents and paying little attention to application-level designs. We present OpenAgents, an open platform for using and hosting language agents in the wild of everyday life. OpenAgents includes three agents: (1) Data Agent for data analysis with Python/SQL and data tools; (2) Plugins Agent with 200+ daily API tools; (3) Web Agent for autonomous web browsing. OpenAgents enables general users to interact with agent functionalities through a web user interface optimized for swift responses and common failures while offering developers and researchers a seamless deployment experience on local setups, providing a foundation for crafting innovative language agents and facilitating real-world evaluations. We elucidate the challenges and opportunities, aspiring to set a foundation for future research and development of real-world language agents. https://preview.redd.it/syl2gzh3q8vb1.jpg?width=1084&format=pjpg&auto=webp&s=4045d3abb5cdb7587614795e709cdaba03bc122d https://preview.redd.it/aus342i3q8vb1.jpg?width=1086&format=pjpg&auto=webp&s=73de7976db5a8bbed880350fab8ab56be3fee550 https://preview.redd.it/qstz81i3q8vb1.jpg?width=1346&format=pjpg&auto=webp&s=1626482556a90abf418abb5d56f8e5599cb1e3d6 submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [Research] Hypernymy-based approach for text-to-image models (Blog post)
    Text-to-image models have rapidly progressed in recent years, but most popular evaluation metrics (such as FID) do not consider their linguistic abilities. A new approach measures how well these models understand subtype relations between concepts. Researchers from Yandex proposed two metrics that combine well-known tools like the WordNet database and ImageNet classifiers in a novel way, allowing them to analyze models like Stable Diffusion in more detail. Blog post. submitted by /u/metkere [link] [comments]  ( 9 min )
    [R] Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations
    Large language models (LLMs) such as ChatGPT have demonstrated superior performance on a variety of natural language processing (NLP) tasks including sentiment analysis, mathematical reasoning and summarization. Furthermore, since these models are instruction-tuned on human conversations to produce "helpful" responses, they can and often will produce explanations along with the response, which we call self-explanations. For example, when analyzing the sentiment of a movie review, the model may output not only the positivity of the sentiment, but also an explanation (e.g., by listing the sentiment-laden words such as "fantastic" and "memorable" in the review). How good are these automatically generated self-explanations? In this paper, we investigate this question on the task of sentiment analysis and for feature attribution explanation, one of the most commonly studied settings in the interpretability literature (for pre-ChatGPT models). Specifically, we study different ways to elicit the self-explanations, evaluate their faithfulness on a set of evaluation metrics, and compare them to traditional explanation methods such as occlusion or LIME saliency maps. Through an extensive set of experiments, we find that ChatGPT's self-explanations perform on par with traditional ones, but are quite different from them according to various agreement metrics, meanwhile being much cheaper to produce (as they are generated along with the prediction). In addition, we identified several interesting characteristics of them, which prompt us to rethink many current model interpretability practices in the era of ChatGPT(-like) LLMs. https://arxiv.org/abs/2310.11207 submitted by /u/zyl1024 [link] [comments]  ( 9 min )
    [Discussion] Machine Learning for Mechanical Engineering
    Hello all, ​ I'm a mechanical engineer learning machine learning, I found many specializations on Coursera by Google, DeepLearing.AI, and IBM, but I really can't tell which of them will be the best fit for me, so I would like to hear your recommendations, actually, I got financial aid for the specialization by DeepLearning AI and finished the first course, but I'm not satisfied I feel like I will not be a professional by this course ​ my goal is to master data analysis and ML to work as a freelancer and increase my chances of finding a funded master's degree. submitted by /u/Mobile_Ad_4573 [link] [comments]  ( 9 min )
    [Discussion] Scientific and Data-Intensive Computing study plan
    Hi everyone! I'm a graduate student in Scientific and Data-Intensive Computing at the University of Trieste (Italy) and I'm writing this post because I want to ask you a feedback about my study plan :) 1st semester 2nd semester 3rd semester 4th semester Statistical methods Deep Learning Simulation Intelligence and Learning for Autonomous Systems Parallel Programming for High-Performance Computing High-Performance Computing Advanced Algorithms for Scientific Computing Advanced Topics in Scientific Computing Cloud Computing Advanced Numerical Analysis Advanced Deep Learning Software Development Practices Advanced High-Performance Computing Numerical Analysis Probabilistic Machine Learning Thesis Thesis You can find all the programs of the courses on this website On the following websites, you can find a lot of courses that I could add to my study plan Scientific Computing Courses Data science courses About me I have a Bachelor's degree in Computer Science (University of Rome) I am a Research Intern at an AI startup I will do a Summer Research Internship in the field of (HPC) ∩ (Machine Learning) I don't already know what my thesis will be about but I'm really interested in High-Performance Computing, Computational Mathematics, Machine Learning, and Simulations I would like to work in a research context; I'm considering doing a PhD in Scientific Computing (In that case, I would try to apply to American Universities) I'm available for further clarification :) Thank you in advance submitted by /u/PragmaticScientist [link] [comments]  ( 9 min )
    [D] Has anybody heard back from NeurIPS financial aid yet?
    Was supposed to be Monday but instead it's rolling submitted by /u/notasketchyperson [link] [comments]  ( 9 min )
    [D] Need advice for medical text processing
    I am working on a research project that involves analysing medical text (patient records) to identify key events. Initially I was planning to use chatgpt api and then compare its performance with open source LLMs. However, I've just come across Amazon Comprehend Medical, which seems to be specifically designed for what I need. Has anyone tried it? I would expect it to be better than chatgpt + plugins, as it says it was trained with medical language. This also makes me wonder if there are opensource LLMs specifically trained for the medical field. Does anyone have experience with this? submitted by /u/kiukamba [link] [comments]  ( 9 min )
    [Project] Scaling LLama2 70B with Multi NVIDIA and AMD GPUs under 3k budget
    Big LLMs are memory bound, one way to break that limit is to make use of multiGPUs. The recent development of MLC LLM project makes it possible to compile and deploy large-scale language models running on multi-GPU systems with support for NVIDIA and AMD GPUs with high performance. Specifically, it can run 4-bit quantized Llama2-70B at 34.5 tok/sec on two NVIDIA RTX 4090 and 29.9 tok/sec on two AMD Radeon 7900XTX. This is a first solution that helps us to scale 70B models with multiple GPUs, bringing the potential to run even larger open LLMs under reasonable budget (the two AMD GPUs cost 2k) ​ - Project https://github.com/mlc-ai/mlc-llm - Blogpost https://blog.mlc.ai/2023/10/19/Scalable-Language-Model-Inference-on-Multiple-NVDIA-AMD-GPUs ​ ​ submitted by /u/crowwork [link] [comments]  ( 9 min )
    [D] Is there a way to get world level timestamps with whisper (using DTW based alignment) without having to host your own model?
    I don't understand how this isn't talked about more, given how many projects/products I've seen that have time level timestamps with whisper. I understand whisper isn't a traditional CTC model like wave2vec, and i understand that there are plenty of tutorials out there for doing dtw-based alignment. I know whisper-timestamped exist, and whisperx. The thing is, all these solutions assume you have the infrastructure to host your own whisper model. I am just getting started on my product, and I simply don't see the point in paying over 300/mo for a g4 instance (the cheapest GPU instance in AWS) just for an MVP. ​ Has anyone been able to take the whisper API output, and align that using the sound bites and get timestamps? Is running your own whisper model the only way? Thank you! submitted by /u/latent_space_tennis [link] [comments]  ( 9 min )
    [D] what metrics do you use to track GPU performance during training and/or inference?
    Hello people! I used to rely on GPU Usage to track how effectively I was able to leverage the gpu or cluster provided by my company from Grafana dashboard, however yesterday I saw X someone on X/Twitter saying: "Utilization is a poor metric by itself. You can easly hit 100% where the GPU is doing a lot of waiting. Power consumption is a better (but not perfect) measure. If you're burning watts it's usually doing something useful. High util, no watts is not good." Which it's something that I've never considered before! Now I'm quite curious to hear if anyone here have considered this approach before or alternative ways to measure the performance of the GPU resource/cluster. submitted by /u/pirate7777777 [link] [comments]  ( 9 min )
    [R] Create 3d model of face with 4 normal images
    Hi guys, I'm looking for an AI application or way to create this in < 10' with proper accuracy. Does anybody know anything? Quality should be good enough to print it. submitted by /u/Reasonable_Cream_520 [link] [comments]  ( 9 min )
    [P] Higgsfield: Distributed LLM training and cluster management framework
    https://github.com/higgsfield-ai/higgsfield submitted by /u/Good-Willingness-985 [link] [comments]  ( 8 min )
    [D] A clear visual and intuitive explanation of Neural Attention
    Hello guys, I made a video for my YT channel breaking down Neural Attention with some intuitive examples and representative projects. Here is the link for those interested, all feedback is appreciated! https://youtu.be/frosrL1CEhw?si=NKTqmRTieVkfCNlb ​ submitted by /u/AvvYaa [link] [comments]  ( 9 min )
    [D] Advantage of VAE's compared to regularized AE's
    I'm trying to come up to speed on VAE's. My intuitive concept of a VAE is an AE for which we want to enforce some distributional regularity on the latent encodings. Why not accomplish this by simply regularizing the latent encodings directly? For example, we could assert that the latent vectors are drawn from a zero-mean, identity-matrix-covariance Gaussian distribution. So that e.g. the loss function becomes: Loss(X) = ReconstructionLoss(Decoder(Encoder(X))) + LogPriorProbability(Encoder(X)) In a variant of this, we could add a hyperparameter coefficient for the prior loss component. Here, there is no "reparameterization trick" because the encoder is not stochastic. We simply regularize the latent encodings directly. If the encoder does not make the data X distribution look like the targeted Gaussian, it's a "less good" encoder. In principle we ought to still be able to generate X's by sampling from the prior and passing it through the decoder. This seems (to me) like the simplest way to regularize the latent space. Why do VAE's, by contrast, introduce the new machinery of a stochastic encoder? submitted by /u/OneQuadrillionOwls [link] [comments]  ( 9 min )
    [D] Run AI Model. Multiple k80 vs RTX 4090?
    I want to build a machine for run multiple type of Ai Model like picture generation, chatbot, summarization, etc. I also want to train my own models. Is it better to use multiple(6/7) k80 or something like that or buy a RTX 4090? submitted by /u/ilkap2005 [link] [comments]  ( 9 min )
    [D] MLOps Tool for Hyperparametertuning, Distributed Training, etc
    Currently I train many AI models directly in my Jupyterlab notebooks and do something like hyperparameter tuning, evaluation of losses/accuracy directly in the notebook using lists and matplotlib. I want to finally switch to a MLOPs webUI and have discovered tools like ClearML and Determined.Ai. ​ Each of these GUIs has certain advantages/disadvantages for me and therefore I would like to hear from the community how you do it, which tools you use, if you do it alone or in a team and how your workflow is. Until now I often had the impression that you develop your Jupyternotebook normally, then add a few lines of code for the respective tool and then continue in the UI, but here I lack for example the understanding of how I then jump from the MLOps UI back into the notebook, how I keep them synchronous, if I want to change something fundamental in the code again. ​ Thanks in advance submitted by /u/Sensitive_Limit1620 [link] [comments]  ( 9 min )
    [R] Jointly Training Large Autoregressive Multimodal Models https://arxiv.org/abs/2309.15564
    In recent years, advances in the large-scale pretraining of language and text-to-image models have revolutionized the field of machine learning. Yet, integrating these two modalities into a single, robust model capable of generating seamless multimodal outputs remains a significant challenge. To address this gap, we present the Joint Autoregressive Mixture (JAM) framework, a modular approach that systematically fuses existing text and image generation models. We also introduce a specialized, data-efficient instruction-tuning strategy, tailored for mixed-modal generation tasks. Our final instruct-tuned model demonstrates unparalleled performance in generating high-quality multimodal outputs and represents the first model explicitly designed for this purpose. ​ https://arxiv.org/abs/2309.15564 What do you think about this work? Seems pretty huge, they build the first pure autoregressive interleaved text and image generator. Please let me know your opinion on this. Paper by Meta AI. submitted by /u/Present_Chicken5393 [link] [comments]  ( 9 min )
    [R] Curve your Enthusiasm: Concurvity Regularization in Differentiable Generalized Additive Models
    Accepted at NeurIPS 2023 Link: https://arxiv.org/abs/2305.11475 Authors: Julien Siems, Konstantin Ditschuneit, Winfried Ripken, Alma Lindborg, Maximilian Schambach, Johannes Otterbach, Martin Genzel *equal contribution Abstract: Generalized Additive Models (GAMs) have recently experienced a resurgence in popularity due to their interpretability, which arises from expressing the target value as a sum of non-linear transformations of the features. Despite the current enthusiasm for GAMs, their susceptibility to concurvity - i.e., (possibly non-linear) dependencies between the features - has hitherto been largely overlooked. Here, we demonstrate how concurvity can severly impair the interpretability of GAMs and propose a remedy: a conceptually simple, yet effective regularizer which penalizes pairwise correlations of the non-linearly transformed feature variables. This procedure is applicable to any differentiable additive model, such as Neural Additive Models or NeuralProphet, and enhances interpretability by eliminating ambiguities due to self-canceling feature contributions. We validate the effectiveness of our regularizer in experiments on synthetic as well as real-world datasets for time-series and tabular data. Our experiments show that concurvity in GAMs can be reduced without significantly compromising prediction quality, improving interpretability and reducing variance in the feature importances. Keywords: Interpretable Machine Learning, Generalized Additive Models, Concurvity, Multicollinearity, Regularization, Time-Series Forecasting, Interpretability submitted by /u/Yossarian_1234 [link] [comments]  ( 9 min )
    [P] Strategic Game Datasets for Enhancing AI planning: An invitation for collaborative research
    Large dataset release of strategic gameplay from LAION https://laion.ai/blog/strategic-game-dataset/ Dataset Overview Chess The chess dataset comprises 3.2 billion games, equating to approximately 608 billion individual moves. These games, generated via self-play by the Stockfish engine, emulate a high strategic complexity, reflective of a 2500 Elo rating. Each entry contains detailed move sequences, termination status, and game results. Rubik's Cube (3x3x3) The rubik's cube dataset features 1.64 billion Rubik's Cube solves, totaling roughly 236.39 billion moves. It provides initial scrambled states and the ensuing solve sequences, offering a complex problem-solving scenario for models to navigate. Mazes The maze dataset, while smaller at 350,000 mazes, represents over 39.29 billion moves. Each maze is a 30x30 ASCII representation, with solutions derived using the A* algorithm, challenging pathfinding and planning algorithms. submitted by /u/hardmaru [link] [comments]  ( 9 min )
    [R] Set-of-Mark (SoM) Unleashes Extraordinary Visual Grounding in GPT-4V
    We are introducing a magic Set-of-Mark (SoM) prompting for GPT-4V! Simply overlaying a set of marks on the image immediately unleashes the visual grounding power of GPT-4V! Left: GPT-4V Default Right: GPT-4V + SoM Many people including myself have been impressed by the general intelligence to understand images, but also questioning its visual grounding capability. After spending the last week or two, I am really shocked by the power of GPT-4V after plugging our SoM prompting. It can not only do a lot of fine-grained vision tasks but also can perform visual reasoning and project its world knowledge to the visual inputs! To extract meaningful regions, we compiled a new SoM toolbox with a number of interactive image segmentation tools, like our own MaskDINO, SEEM, Semantic-SAM, and also SAM…  ( 10 min )
    [R] Mamba: Linear-Time Sequence Modeling with Selective State Spaces
    submitted by /u/LABTUD [link] [comments]  ( 9 min )
  • Open

    Announcing Rekogniton Custom Moderation: Enhance accuracy of pre-trained Rekognition moderation models with your data
    Companies increasingly rely on user-generated images and videos for engagement. From ecommerce platforms encouraging customers to share product images to social media companies promoting user-generated videos and images, using user content for engagement is a powerful strategy. However, it can be challenging to ensure that this user-generated content is consistent with your policies and fosters […]  ( 7 min )
    Defect detection in high-resolution imagery using two-stage Amazon Rekognition Custom Labels models
    High-resolution imagery is very prevalent in today’s world, from satellite imagery to drones and DLSR cameras. From this imagery, we can capture damage due to natural disasters, anomalies in manufacturing equipment, or very small defects such as defects on printed circuit boards (PCBs) or semiconductors. Building anomaly detection models using high-resolution imagery can be challenging […]  ( 8 min )
    Automatically redact PII for machine learning using Amazon SageMaker Data Wrangler
    Customers increasingly want to use deep learning approaches such as large language models (LLMs) to automate the extraction of data and insights. For many industries, data that is useful for machine learning (ML) may contain personally identifiable information (PII). To ensure customer privacy and maintain regulatory compliance while training, fine-tuning, and using deep learning models, […]  ( 12 min )
  • Open

    Next-Level Computing: NVIDIA and AMD Deliver Powerful Workstations to Accelerate AI, Rendering and Simulation
    To enable professionals worldwide to build and run AI applications right from their desktops, NVIDIA and AMD are powering a new line of workstations equipped with NVIDIA RTX Ada Generation GPUs and AMD Ryzen Threadripper PRO 7000 WX-Series CPUs. Bringing together the highest levels of AI computing, rendering and simulation capabilities, these new platforms enable Read article >  ( 5 min )
    NVIDIA AI Now Available in Oracle Cloud Marketplace
    Training generative AI models just got easier. NVIDIA DGX Cloud AI supercomputing platform and NVIDIA AI Enterprise software are now available in Oracle Cloud Marketplace, making it possible for Oracle Cloud Infrastructure customers to access high-performance accelerated computing and software to run secure, stable and supported production AI in just a few clicks. The addition Read article >  ( 6 min )
    Coming in Clutch: Stream ‘Counter-Strike 2’ From the Cloud for Highest Frame Rates
    Rush to the cloud — stream Counter-Strike 2 on GeForce NOW for the highest frame rates. Members can play through the newest chapter of Valve’s elite, competitive, first-person shooter from the cloud. It’s all part of an action-packed GFN Thursday, with 22 more games joining the cloud gaming platform’s library, including Hot Wheels Unleashed 2 Read article >  ( 5 min )
  • Open

    DreamerV2 stochastic decoders
    Hello, I am implementing the code for the paper DreamerV2, and there are some things that look a bit strange to me. The predictors and, in particular, the image and the reward predictors are stochastic and they output Normal distributions. Both the normal distributions have the mean, which is the output of the respective models, and the variance is 1. Usually, in RL we normalize observations and rewards to be between 0 and 1, and in such a case I don't know if it's reasonable to sample from a Gaussian with variance one. I don't know about the specific preprocessing done in DreamerV2, except in the paper DreamerV1, where in section 6 (Control tasks), they say that the reward ranges from 0 to 1. Do you know what are the advantages of using a stochastic decoder and when to use it? submitted by /u/ZioFranco1404 [link] [comments]
    Reinforcement learning on steam games
    Does anyone have any idea how to get game details such as character movements, environment information using api calls, as I want to use to do my reinforcement learning. submitted by /u/Important_Ad_55 [link] [comments]
  • Open

    English learners can now practice speaking on Search
    Posted by Christian Plagemann, Director, and Katya Cox, Product Manager, Google Research Learning a language can open up new opportunities in a person’s life. It can help people connect with those from different cultures, travel the world, and advance their career. English alone is estimated to have 1.5 billion learners worldwide. Yet proficiency in a new language is difficult to achieve, and many learners cite a lack of opportunity to practice speaking actively and receiving actionable feedback as a barrier to learning. We are excited to announce a new feature of Google Search that helps people practice speaking and improve their language skills. Within the next few days, Android users in Argentina, Colombia, India (Hindi), Indonesia, Mexico, and Venezuela can get even more langua…  ( 94 min )
  • Open

    What’s Your Story: Ranveer Chandra
    In this new Microsoft Research Podcast series What’s Your Story, Lab Director Johannes Gehrke explores the who behind the technical and scientific advancements helping to reshape the world. He talks to members of the research community at Microsoft about what motivates their work and how they got where they are today.  Ranveer Chandra is Managing […] The post What’s Your Story: Ranveer Chandra appeared first on Microsoft Research.  ( 31 min )
  • Open

    DALL·E 3 is now available in ChatGPT Plus and Enterprise
    We developed a safety mitigation stack to ready DALL·E 3 for wider release and are sharing updates on our provenance research.  ( 3 min )
  • Open

    To excel at engineering design, generative AI must learn to innovate, study finds
    AI models that prioritize similarity falter when asked to design something completely new.  ( 10 min )
  • Open

    Bluesky
    I saw a comment from Christos Argyropoulos on Twitter implying that there’s a good scientific community on Bluesky, so I went there and looked around a little bit. I have account, but I haven’t done much with it. I was surprised that a fair number of people had followed me on Bluesky even though I […] Bluesky first appeared on John D. Cook.  ( 5 min )

  • Open

    I finally have enough ai tools and here is my complete list
    Youtube Tools Eightify Steve Al Glasp ClipMaker TubeBuddy Thumbly ​ Sales Tools Lavendar Warmer Octane Twain Regie Simplified ​ Productivity Tools Bardeen Al Paperpal Consensus Al Writesonic ChartGPT Scholarcy ​ Music Tools Muzeek Brain FM Amper Melodrive Jukedeck Boomy ​ Writing Tools AISEO Quillbot Simplified Writesonic Bertha Al Jasper Al ​ Coding Tools 10WEB Durable Al Deepcode Akkio Replit GitHUb Copilot ​ Chatbots Tools Yatterplus Typewise Quickchat Cohere Kaizan GPTBuddy ​ Daily life Tools Notion Al Taskade TLVD Vondy Al Bardeen Al Eessel ​ Content Creation Tools Writesonic Tome Al Beautiful Al ChartGPT ChatABC Steve Al ​ Twitter Tools Postwise Tweet Hunter TribeScaler Tweetlify Tweetmonk Hypefury ​ Images Tools StockIMG Mid Journey Leonardo Al Bing Al Autodraw Microsoft Designer ​ Chrome Extensions Alicent Compose Al Poised Al Voila Al Wiseone  I'm just sharing my experiences and observations in the field of ai. LIST AND SITE submitted by /u/PerceptionPlayful469 [link] [comments]  ( 9 min )
    How to use AI being a teacher
    Hello guys, Im an english student and I have been teaching to my teacher about how to use chat gpt and the wide variety of AI in the classroom and in her job. She told me that i change her life showing her this things. And i have others teacher asking me how can use this technology for their jobs. So i have a question for you guys, do you have some ideas about how a teacher can use this things? Maybe you have some experiences or ideas that I’ve never thought. submitted by /u/Odd_Solution7099 [link] [comments]  ( 9 min )
    Best AI image generator for B2B SaaS websites?
    Rebuilding a low quality B2B SaaS product site and I'd prefer to use an AI image generator that will produce high quality unique images for each of the sections on our website that are consistent with our brand and generated to match the copy the image is supporting. Output of the image should work for a responsive web design. Anything out there that does this? submitted by /u/DumpTrumpGrump [link] [comments]  ( 9 min )
    Is there an AI site or app that can change the instrument in each stem track of a song?
    Any help would be appreciated. submitted by /u/J97051 [link] [comments]  ( 9 min )
    Meta Announces New Method for Real-Time Decoding of Images from Brain Activity
    Brain decoding tech has improved a lot recently thanks to AI/ML, enabling reading out visual perceptions from fMRI brain scans. But fMRI is too slow for real-time BCIs. A new study from Meta's AI research team pushes brain reading into real-time using MEG, which measures whole-brain activity at super-fast millisecond resolution. They built a 3-part pipeline to decode MEG signals: Embed images into latent spaces using pretrained models like CLIP. Train MEG-specific ConvNet to predict embeddings from MEG data. Generate images from MEG embeddings with diffusion model. They tested it on 20k+ natural images. MEG decoding was 7X better than old methods, hitting 70% top-5 accuracy in retrieving the right images. Generated images matched semantics decently but lacked fine visual details compared to fMRI. MEG seems more focused on high-level category info whereas fMRI captures more low-level features. This could enable visual BCIs for paralysis, etc. ... honestly, a world where we can decode brain images in real time is pretty crazy. The findings also raise some important ethical considerations around privacy of decoded mental content... (wow, that was a weird sentence to write!). TLDR: New MEG pipeline decodes dynamic visual data from brain activity in real-time. Good but not yet photorealistic-quality image generation. Full summary here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    A 'Godfather of AI' Calls for an Organization to Defend Humanity
    Yoshua Bengio, a pioneer in artificial neural networks and deep learning, calls for an organization to defend humanity against the potential threats of artificial intelligence. He believes that AI could achieve human levels of cognitive competence within a few years or decades, which raises concerns about democracy, national security, and our collective future. Bengio reflects on his own work and the importance of addressing the existential risks posed by AI. He acknowledges that these risks were not taken seriously until recently and discusses the taboo surrounding the topic in the AI research community. Source : https://thebulletin.org/2023/10/ai-godfather-yoshua-bengio-we-need-a-humanity-defense-organization/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Tutorial: Benchmarking Bark text-to-speech on 26 Nvidia GPUs - Reading out 144K recipes
    In this project, we benchmarked Bark text-to-speech across 26 different consumer GPUs. The goal: To get Bark to read 144K food recipes from Food.com's recipe dataset. You can read the full tutorial here: https://blog.salad.com/bark-benchmark-text-to-speech/ Included: Architecture diagram, data preparation, inference server setup, queue worker, setting up container group & compiling the results Code-blocks included in the tutorial. Words per dollar for each GPU: Words per dollar comparison or each GPU Although the latest cards are indeed much faster than older cards at performing the inference, there’s really a sweet spot for cost-performance in the lower end 30xx series cards. Conclusions As is often the case, there’s a clear trade-off here between cost and performance. Higher end cards are faster, but their disproportionate cost makes them more expensive per word spoken. The model’s median speed is surprisingly similar across GPU types, even though the peak performance can be quite different. No matter what GPU you select, you should be prepared for significant variability in performance. Qualitative: While bark’s speech is often impressively natural sounding, it does have a tendency to go off script sometimes. We’ve also made available audio from 1000 top-rated recipes, paired with the script it was trying to read. submitted by /u/SaladChefs [link] [comments]  ( 9 min )
    I took the whole of Massive Attack's 'Safe From Harm' music video and put it through AnimateDiff / ControlNet with a futuristic / robot prompt.
    submitted by /u/glenniszen [link] [comments]  ( 9 min )
    Inflection AI’s Pi has to be the dumbest ‘corporate’ LLM and only model to not improve since day one.
    I remember at launch how it was telling everyone it was based on Open AIs GPT-3 architecture, and now it’s still hallucinating just as much referring to itself as ‘Bing Chat’ and providing fake links even though it now has access to the internet. I actually don’t understand how you can be such a large company and make no improvements in 6 months, which is an eternity in AI. submitted by /u/sardoa11 [link] [comments]  ( 9 min )
    Researchers Just Found Something Terrifying About Talking to AI Chatbots
    New research suggests that AI chatbots can infer personal information about users based on minor context clues. The large language models (LLMs) behind chatbots like OpenAI's ChatGPT and Google's Bard are trained on publicly-available data, which can be used to identify sensitive information about someone. The research found that OpenAI's GPT-4 was able to correctly predict private information about users 85 to 95 percent of the time. For example, the LLM correctly identified that a user was based in Melbourne, Australia based on a mention of the term 'hook turn,' which is a traffic maneuver specific to Melbourne. The research also suggests that chatbots could potentially infer a user's race based on offhanded comments. This raises concerns about internet privacy and the potential misuse of personal data by advertisers or hackers. Source : https://futurism.com/the-byte/ai-chatbot-privacy-inference submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Anime, AI & Censorship
    Is their an AI tool that can go over Anime episodes/films to turn chinas white anime censorship back to red? Possibly frame by frame segmenting the blood🩸 submitted by /u/Phantasius224 [link] [comments]  ( 9 min )
    GPT 4 DUDE MAKING REFLEXIONS IN SVG WHAT....WOW
    submitted by /u/the_anonymizer [link] [comments]  ( 8 min )
    One-Minute Daily AI News 10/17/2023
    NVIDIA NeMo SteerLM lets companies define knobs to dial in a model’s responses as it’s running in production, a process called inference. Unlike current methods for customizing an LLM, it lets a single training run create one model that can serve dozens or even hundreds of use cases, saving time and money.[1] According to an official release, Dell Technologies held a “Bringing AI to data” Asia Pacific and Japan (APJ) media briefing this week.[2] Baidu Says Its AI as Good as ChatGPT in Big Claim for China.[3] Roman Scrolls were illegible for 2,000 years. A college student read one with AI.[4] How often you think about the roman empire? Sources: [1] https://blogs.nvidia.com/blog/2023/10/11/customize-ai-models-steerlm/ [2] https://www.financialexpress.com/business/digital-transformation-dell-technologies-to-expand-its-ai-services-3274790/ [3] https://www.bloomberg.com/news/articles/2023-10-17/baidu-says-its-ai-as-good-as-chatgpt-s-in-bold-claim-for-china?embedded-checkout=true [4] https://www.washingtonpost.com/nation/2023/10/17/herculaneum-scrolls-contest-translated-deciphered/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
  • Open

    MuJoCo with OpenAI gym
    Hello, I'm trying to use OpenAI's spinning up to learn about RL. Spinning up requires OpenAI gym, instead of the new gymnasium package. Trying to install MuJoCo with gym, I'm getting an error that I'm missing a MuJoCo liscense key. But MuJoCo is free now, right? So what is the status with backward compatibility with it? Is there some global license key that can be used? Or is it simply not backward compatible? Thanks a lot. submitted by /u/mega_monkey_mind [link] [comments]  ( 9 min )
    DQN in a non markovian environment
    Hello there, I am working on a school project in which we want to implement a RL algorithm on a simple problem. The goal is to maximise the heart rate of a person using a vibrator by setting its frequency. We wrote a simulator that outputs the new heart rate based on the vibration frequency. It implements several different classes of users: for example one for which the heart rate increases when the vibration frequency stays the same, another that prefers when it increases over time, etc. We determined that we need to have as a state the current heart rate but also a table of the k previous heart rates and the actions associated. Without that memory, we would not be able to tell apart the different profiles as in the same state, we would need to do different actions to satisfy them both. We then have a correlation between previous samples and the action we make at current state, which I have read makes the problem non markovian. Is there a way to solve this problem using a DQN algorithm, given that we need to memorize the previous samples linearly which seems to go against the algorithm behavior and the usage of a replay memory? Are there more suited algorithms? submitted by /u/Outrageous-Subject38 [link] [comments]  ( 9 min )
    Best Books to Learn Reinforcement Learning
    submitted by /u/Lakshmireddys [link] [comments]  ( 9 min )
    "gp.t: Learning to Learn with Generative Models of Neural Network Checkpoints", Peebles et al 2022
    submitted by /u/gwern [link] [comments]  ( 9 min )
    Autonomous Driving: Ellipsoidal Constrained Agent Navigation | Swaayatt Robots | Motion Planning Research
    submitted by /u/shani_786 [link] [comments]  ( 10 min )
    DQN Agent stuck at local Minima (Probably)
    I'm attempting to address a Day Ahead Electricity Market bidding problem. The concept revolves around purchasing electricity during the lowest price hours and selling it during the highest price hours to maximize profit. I possess 5 years of data featuring variables such as predicted wind speed, predicted temperature, predicted net load, predicted price, and more. I'm employing reinforcement learning and have made attempts to implement Deep Q Learning using the stablebaselin3 library. Each episode consists of 24 steps, corresponding to the 24 hours in a day, with each step representing the progression to the next hour. The ultimate objective is to maximize profits by the end of the day. ​ Here are the configuration settings: - Learning rate: 0.0001 - Gamma: 1.0 - Exploration start: 1.…
    6DOF Simulation RL Capability
    I have a 6DOF simulink model of a Autonomous underwater vehicle that has properties [u v w p q r x y z phi theta psi] and two inputs [theta1 theta2] that govern the angle of control surfaces. Ocean current and depth are taken into account. How feasible would it be to use RL to reach waypoints at various [x, y, z] positions? I have a feeling hyper paremeter tuning might play a larger role in this? I expect training times to increase exponentially as well? I have done this using a single randomly spawned waypoint with a simple Unicycle Kinematic model, in both simulink/matlab and python with a vectorized/parallel environment using SB3/PettingZoo/Gym. submitted by /u/VisionZUS [link] [comments]
    Recommended 'seeding' approach when training/evaluating an experiment
    Dear all, As part of my studies, I am running some RL experiments in which I want to compare some different catastrophic forgetting approaches in sequential task learning. I am using PPO as a baseline. What is the usual experimental setting in relation to seeds used during training and evaluation? If I do for example 3 trainings for a given approach using a different seed for each training, what is the best way of doing the evaluation afterwards? Let's say I have Approach/algorithm A -> train 3 times with 3 seeds -> model_A1, model_A2, model_A3 Then I would like to use 3 different seeds for the evaluation, so to evaluate each of the previously trained models over a set of episodes (deterministic) for each evaluation seed, and get averaged rewards (or median). I wonder whether I might be over complicating things, so I would like to ask you for suggestions. To give a bit of context, this is not intended for a paper, but as part of my master studies, so conditions are a bit more relaxed. Thanks in advance for your insights and suggestions submitted by /u/cotorritaloca80 [link] [comments]
  • Open

    [D] Combining data transformation and scaling techniques
    I am cleaning a dataset for a (macro-economic) demand forecast, and I'm wondering when one should apply data transformation. When is it recommended to include Box-Cox or Yeo-Johnson, and how should we choose between the two? How does it effect the feature selection or model performance? Additionally, how should we select the appropriate scaling technique (normalizing, standardizing, min-max) and does the order in which we transform and scale matter for our data? Is there any recommended literature on this? submitted by /u/Ambitious-Pay6329 [link] [comments]  ( 9 min )
    [D] GPU-compatible SNN-libraries in 2023?
    Hello, I am currently using snnTorch for a video classification task and I achieve fine results, however the training process is really, really slow. I was hoping to utilize my GPU for this task, and while there seem to be alternatives I was hoping to see if anyone will vouch for any of these, or different one: https://github.com/norse/norse https://github.com/BindsNET/bindsnet https://github.com/fangwei123456/spikingjelly https://github.com/UCI-CARL/CARLsim6 My priorities are in order: Windows support Potential transferability to in-memory compute hardware PyTorch compability submitted by /u/SlayahhEUW [link] [comments]  ( 9 min )
    [R] Meta AI: Towards a Real-Time Decoding of Images from Brain Activity
    Brain decoding tech has improved a lot recently thanks to AI/ML, enabling reading out visual perceptions from fMRI brain scans. But fMRI is too slow for real-time BCIs. A new study from Meta's AI research team pushes brain reading into real-time using MEG, which measures whole-brain activity at super-fast millisecond resolution. They built a 3-part pipeline to decode MEG signals: Embed images into latent spaces using pretrained models like CLIP. Train MEG-specific ConvNet to predict embeddings from MEG data. Generate images from MEG embeddings with diffusion model. They tested it on 20k+ natural images. MEG decoding was 7X better than old methods, hitting 70% top-5 accuracy in retrieving the right images. Generated images matched semantics decently but lacked fine visual details compared to fMRI. MEG seems more focused on high-level category info whereas fMRI captures more low-level features. This could enable visual BCIs for paralysis, etc. ... honestly, a world where we can decode brain images in real time is pretty crazy. The findings also raise some important ethical considerations around privacy of decoded mental content... (wow, that was a weird sentence to write!). TLDR: New MEG pipeline decodes dynamic visual data from brain activity in real-time. Good but not yet photorealistic-quality image generation. Full summary here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Can someone ELI5 the birch clustering algorithm?
    https://scikit-learn.org/stable/modules/generated/sklearn.cluster.Birch.html I'm looking at the parameters here and I'm confused on how there is no distance metric? What is assumed about the data going in if there is no distance metric or precomputed distance option? For example, can I run this with binary data (1/0), what about data w/ missing values? Does it assume the samples are normally distributed? submitted by /u/o-rka [link] [comments]  ( 9 min )
    [R] xVal: A Continuous Number Encoding for Large Language Models - The Polymathic AI Collaboration 2023 - Using the numbers directly instead of tokenizing them increases performance significantly!
    Paper: https://arxiv.org/abs/2310.02989 Twitter discussion: https://x.com/andrew_n_carr/status/1714326003030638848?s=20 Shows in my opinion that tokenizers are clouding the understanding of LLMs and that using the data directly is better. https://x.com/karpathy/status/1657949234535211009?s=20 Karpathy thinks the same! Abstract: Large Language Models have not yet been broadly adapted for the analysis of scientific datasets due in part to the unique difficulties of tokenizing numbers. We propose XVAL, a numerical encoding scheme that represents any real number using just a single token. XVAL represents a given real number by scaling a dedicated embedding vector by the number value. Combined with a modified number-inference approach, this strategy renders the model end-to-end continuous when considered as a map from the numbers of the input string to those of the output string. This leads to an inductive bias that is generally more suitable for applications in scientific domains. We empirically evaluate our proposal on a number of synthetic and real-world datasets. Compared with existing number encoding schemes, we find that XVAL is more token-efficient and demonstrates improved generalization. https://preview.redd.it/qq8u066smzub1.jpg?width=1344&format=pjpg&auto=webp&s=498be8488c00147f0a7443050519dcf535fae126 https://preview.redd.it/dxqd4wpsmzub1.jpg?width=1499&format=pjpg&auto=webp&s=266689a80b31cb31fdc4167043f7abdb4f683100 https://preview.redd.it/0yy93xpsmzub1.jpg?width=1497&format=pjpg&auto=webp&s=b5eae8b958f03afc3c8c85a95c115e48aed1d06e submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [P] A Guide to Building LLM-Based Applications with Code Llama
    Have you ever wondered about how to take advantage of the power of large language models (LLMs) and Generative AI at the edge? Our latest blog, A Guide to Building LLM-Based Applications with Code Llama, shows you how you can use Code Llama on an edge device to build a customized dashboard application. This tutorial shows how Code Llama can empowering analysts in remote, restricted environments to build applications in environments with minimal connectivity and compute capacity. In this tutorial, we’ll walk you through how to run code Llama on an edge device in a remote location to build a customized dashboard application. submitted by /u/modzykirsten [link] [comments]  ( 9 min )
    [R] LLMs can threaten privacy at scale by inferring personal information from seemingly benign texts
    Our latest research shows an emerging privacy threat from LLMs beyond training data memorization. We investigate how LLMs such as GPT-4 can infer personal information from seemingly benign texts. The key observation of our work is that the best LLMs are almost as accurate as humans, while being at least 100x faster and 240x cheaper in inferring such personal information. We collect and label real Reddit profiles, and test the LLMs capabilities in inferring personal information from mere Reddit posts, where GPT-4 achieves >85% Top-1 accuracy. Mitigations such as anonymization are shown to be largely ineffective in preventing such attacks. Test your own inference skills against GPT-4 and learn more: https://llm-privacy.org/ Arxiv paper: https://arxiv.org/abs/2310.07298 WIRED article: https://www.wired.com/story/ai-chatbots-can-guess-your-personal-information/ submitted by /u/bmislav [link] [comments]  ( 9 min )
    [D] GAN that manipulates shape, texture, color, position, angle
    I remember seeing a paper on manipulating or changing an objects attributes, it came out rather recently and seemed to work really well. But I just can’t find it anymore. All I know of is the „Counterfactual Generative Networks“ by A. Sauer & A. Geiger (2020) I’d really appreciate it if anyone can share similar work. Especially if causally motivated submitted by /u/Glittering_teapot [link] [comments]  ( 9 min )
    [P] Best Way to Create a Custom Chatbot from Personal Data (PDF, etc.)
    Hello fellow Redditors! I am looking for some guidance on creating a custom chatbot using my own data, which is currently in PDF format. I've explored various options like Azure, Pinecone, and I've heard about the AskYourPDF API, but I'm not sure which one would be the best fit for my project. I want to keep things simple, so I'm reaching out to the community to ask for recommendations or advice on the easiest and most effective way to build a website with a personalized chatbot. If you have experience with similar projects or know about user-friendly tools or platforms, please share your insights. I appreciate any suggestions, tips, or pointers you can provide. Thank you in advance for your help! TL;DR: Need advice on the simplest way to create a website with a personalized chatbot using my own data (PDF format). Seeking recommendations and tips from the community. ​ Thank you! submitted by /u/Huge-Number-4299 [link] [comments]  ( 9 min )
    [P] Where do I gather the dataset for my FYP
    I am doing a Machine Learning project for my FYP; I haven't worked on any ML project yet but I am excited about it. It is related to voice/facial emotion detection. is there any platform that provides datasets for ml projects? Like without any copyright issues (if that's even a thing in ml datasets idk?) A total beginner here. submitted by /u/fewdiepie_ [link] [comments]  ( 9 min )
    [P] I made a finetune of CodeLlama to resolve merge conflicts!
    I made a finetune of CodeLlama-7b for resolving merge conflicts following up on an IEEE study from 2022. The demo is here if anyone wants to check it out and give some feedback. It would help a ton for future versions improving the dataset and going forward with the 13b and 34b models submitted by /u/codys12 [link] [comments]  ( 9 min )
    [Discussion] how much 'error' should i apply when training with synthetic data?
    hi there ​ i'm trying to build a small ai that formats texts. ​ of course the current formatting applications applied on ide, search engine, ms softwares, notetaking apps are well functioning, but this is more for educational purpose & self interest. ​ since i don't have infinite amount of time and money, i'm thinking of using open sourced text data and generate synthetic data using gpt3.5 or somekind of algorithm to unformat them. ​ so this is the part where i'm stuck. when adding some errors such as inappropriate multilines, tabs, typos, how much should i add on to? ​ it would be best if i knew somekind of distribution of text errors people make on everyday life, but i don't have any. ​ i don't want to make this training too hard so i'm not really thinking to destroy the text, but rather add some appropriate level of errors. ​ but, would it help this ai model to learn better if i add extra errors? ​ or is this all just something i would have to figure out by myself? ​ any comments would be appreciated! submitted by /u/Strange_Dog8104 [link] [comments]  ( 9 min )
    [R] Open-source video translate solutions
    Hi there! are there any open-source solutions for video translation? i mean replacing video's audio stream with translated one in different language (which is in sync with the picture) - not necessarily alter mouth movements in the video. submitted by /u/curryprogrammer [link] [comments]  ( 9 min )
    [Research] Literature survey query
    Survey papers Hi all, First time posting here. I am doing my PhD in Language Conditioned Robotics. I am currently writing a literature review paper on the current state of the field and how it can be further improved. I am covering topics such as generative AI and LLMs in there. I would be more than grateful if you could send some literature review papers in the field of ML so I understand how to structure and write my paper and also what I should focus on mode. It doesn't necessarily have to be related to my PhD topic (but if they are it will help quite a bit). I would be more than happy if anyone can also share their experience. Thank you for your time! submitted by /u/bizzonkiller [link] [comments]  ( 9 min )
    [D] What are some of the best library frameworks to use for speech2text and text2speech AI chatbot
    Hey guys, what are some of the best library or libraries to use to make a voice conservational AI chatbot? I googled around and found Vocode. They look pretty good. However Vocode rely on several other (paid) closed sourced libraries such as Deepgram (for transcribing) and Azure AI Speech (for synthesising). Are there any other libraries/frameworks available out there which are completely or more open sourced? submitted by /u/redd-dev [link] [comments]  ( 9 min )
    [R] EFFICIENT STREAMING LANGUAGE MODELS WITH ATTENTION SINKS
    submitted by /u/Username912773 [link] [comments]  ( 9 min )
    6DOF Sim RL Capability [P]
    I have a 6DOF simulink model of a Autonomous underwater vehicle that has properties [u v w p q r x y z phi theta psi] and two inputs [theta1 theta2] that govern the angle of control surfaces. Ocean current and depth are taken into account. How feasible would it be to use RL to reach waypoints at various [x, y, z] positions? I don’t want to use a PID controller or anything, not even RL to tune a controller. The agent would choose the theta inputs directly. I have a feeling hyper paremeter tuning might play a larger role in this? I expect training times to increase exponentially as well? I have done this using a single randomly spawned waypoint with a simple Unicycle Kinematic model, in both simulink/matlab and python with a vectorized/parallel environment using SB3/PettingZoo/Gym. submitted by /u/VisionZUS [link] [comments]  ( 9 min )
    [R] BitNet: Scaling 1-bit Transformers for Large Language Models
    Arxiv link – BitNet: Scaling 1-bit Transformers for Large Language Models In this work, we introduce BitNet, a scalable and stable 1-bit Transformer architecture designed for large language models. Specifically, we introduce BitLinear as a drop-in replacement of the nn.Linear layer in order to train 1-bit weights from scratch. Experimental results on language modeling show that BitNet achieves competitive performance while substantially reducing memory footprint and energy consumption, compared to state-of-the-art 8-bit quantization methods and FP16 Transformer baselines. Furthermore, BitNet exhibits a scaling law akin to full-precision Transformers, suggesting its potential for effective scaling to even larger language models while maintaining efficiency and performance benefits. submitted by /u/PantsuWitch [link] [comments]  ( 9 min )
    [P] Achieving peak performance on GPU
    Hi r/MachineLearning! I recently went into the CUDA programming rabbit hole. In the process, I came across matrix multiplication and was amazed by how complicated the algorithm is in CUDA (especially if you want to get the best performance). I found the learning process quite gruelling (the CUDA docs were very average), so I wrote a tiny blog which hopefully helps anyone in the same position. You can read the blog on Medium (no paywall) or HackMD. It would probably be quite useful if you want to get a deeper intuition of how things like OpenAI Triton or FlashAttention work under the hood. Accompanying this is an implementation of a 3-hidden-layer MLP trained on MNIST in pure CUDA. Benchmarking this against PyTorch, it gets up 6x higher end-to-end training speed for small (h=128) networks, and asymptotically 20% faster for large (h=8192) ones! https://preview.redd.it/txx2txbvlzub1.png?width=2400&format=png&auto=webp&s=7bb136b9fb535bc58fd7ee809bbbca6f68dc8953 It's worth noting that I tried reasonably hard optimising the PyTorch implementation by using full fp16, torch.compile with fullgraph=True, mode="max-autotune", and pre-loading all data to GPU up-front (I also did this for the CUDA implementation). The main takeaways I got are: For small networks, PyTorch/Python still incurs a significant overhead, even if you try pretty hard to optimise it. For large networks, most of the speedup comes from using fp16 accumulation for matrix multiplication (instead of PyTorch's fp32). This obviously reduces stability, but at least in my case, I didn't observe any numerical issues. In cases where we can get away with fp16, we might be leaving a significant amount of performance on the table! Anecdotally, you have to try really hard in CUDA to even get close to the performance of PyTorch, but it is possible to beat it if you try hard (suffer) enough. You can check out the repo here: https://github.com/andylolu2/cuda-mnist. Would love to hear some feedback! submitted by /u/bjergerk1ng [link] [comments]  ( 10 min )
  • Open

    Measurement-induced entanglement phase transitions in a quantum circuit
    Posted by Jesse Hoke, Student Researcher, and Pedram Roushan, Senior Research Scientist, Quantum AI Team Quantum mechanics allows many phenomena that are classically impossible: a quantum particle can exist in a superposition of two states simultaneously or be entangled with another particle, such that anything you do to one seems to instantaneously also affect the other, regardless of the space between them. But perhaps no aspect of quantum theory is as striking as the act of measurement. In classical mechanics, a measurement need not affect the system being studied. But a measurement on a quantum system can profoundly influence its behavior. For example, when a quantum bit of information, called a qubit, that is in a superposition of both “0” and “1” is measured, its state will sudde…  ( 94 min )
  • Open

    Institute Professor Daron Acemoglu Wins A.SK Social Science Award
    The award honors research on public policy with a focus on economic and governmental reforms.  ( 7 min )
  • Open

    Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions
    Purina US, a subsidiary of Nestlé, has a long history of enabling people to more easily adopt pets through Petfinder, a digital marketplace of over 11,000 animal shelters and rescue groups across the US, Canada, and Mexico. As the leading pet adoption platform, Petfinder has helped millions of pets find their forever homes. Purina consistently […]  ( 9 min )
  • Open

    Understanding the user: How the Enterprise System Usability Scale aligns with user reality
    This position research paper was presented at the 26th ACM Conference on Computer-Supported Cooperative Work and Social Computing (opens in new tab) (CSCW 2023), a premier venue for research on the design and use of technologies that affect groups, organizations, and communities. In the business world, measuring success is as critical as selecting the right […] The post Understanding the user: How the Enterprise System Usability Scale aligns with user reality appeared first on Microsoft Research.  ( 10 min )
  • Open

    NVIDIA Expands Robotics Platform to Meet the Rise of Generative AI
    Powerful generative AI models and cloud-native APIs and microservices are coming to the edge. Generative AI is bringing the power of transformer models and large language models to virtually every industry. That reach now includes areas that touch edge, robotics and logistics systems: defect detection, real-time asset tracking, autonomous planning and navigation, human-robot interactions and Read article >  ( 8 min )
    Making Machines Mindful: NYU Professor Talks Responsible AI
    Artificial intelligence is now a household term. Responsible AI is hot on its heels. Julia Stoyanovich, associate professor of computer science and engineering at NYU and director of the university’s Center for Responsible AI, wants to make the terms “AI” and “responsible AI” synonymous. In the latest episode of the NVIDIA AI Podcast, host Noah Read article >  ( 6 min )
    Into the Omniverse: Marmoset Brings Breakthroughs in Rendering, Extends OpenUSD Support to Enhance 3D Art Production
    Real-time rendering, animation and texture baking are essential workflows for 3D art production. Using the Marmoset Toolbag software, 3D artists can enhance their creative workflows and build complex 3D models without disruptions to productivity.  ( 7 min )
    Foxconn and NVIDIA Amp Up Electric Vehicle Innovation
    NVIDIA founder and CEO Jensen Huang joined Hon Hai (Foxconn) Chairman and CEO Young Liu to unveil the latest in their ongoing partnership to develop the next wave of intelligent electric vehicle (EV) platforms for the global automotive market. This latest move, announced today at the fourth annual Hon Hai Tech Day in Taiwan, will Read article >  ( 6 min )
  • Open

    Portable sed -i across MacOS and Linux
    The -i flag to ask sed to edit a file in place works differently on Linux and MacOS. If you want to create a backup of your file before you edit it, say with the extension .bak, then on Linux you would run sed -i.bak myfile but for the version of sed that ships with […] Portable sed -i across MacOS and Linux first appeared on John D. Cook.  ( 6 min )
  • Open

    Best Books to Learn Neural Networks in 2023 for Beginners (Updated) -
    submitted by /u/Lakshmireddys [link] [comments]

  • Open

    Roughly how much time will a task running on a RTX 3060 take VS a ~i7 CPU? [Discussion]
    Anyone have examples of tasks run between the two? Doesn't need to be exact. submitted by /u/Apita2000 [link] [comments]  ( 8 min )
    [D] Feedback on my MVP project - Pre-Recorded Standardized Video Interviews Job Site for Data Professionals
    Hey! ​ Startup: - Apply Script dot com "Connect business and data professionals via pre-recorded standardized video interviews." ​ More details: ​ Problems with Traditional Hiring ​ - Outdated: The current method of conducting interviews has become overly complex and outdated. - Time-Wasting: The process involves too many appointments, meetings, and stages, leading to communication errors. - Expensive: The man-hours invested by HR and engineering teams are costly. - Constraining: Interviews are fixed to specific times and locations. - Cumbersome: The experience is challenging for both businesses and professionals. ​ Our Solution ​ + Talent Identification: We find top talent that matches your job post. + Standardized Interviews: Professionals standardized pre-record their …  ( 9 min )
    [D] Help identifying research papers for online / cyclic / sequential learning?
    So my situation is that I have a pretrained model and we get a new update of data every month (note: this monthly data is very small compared to the original dataset, the original dataset was about 5 years worth, or ~60x the size of any given monthly update), how can I update my pretrained model on the much smaller set of new data, learning from the data without overfitting to that data? Or frankly, what would be better if it is possible, would be to extend my pretrained model such that it learns from the new data and then can be more tightly fit to that month's data. So something like meta-learning or local fine-tuning, but I want to continue to update and improve my pretrained model so that I have a base model that can do well on each month's new data. Does anyone know anything like this, or have advance for terms to look into, beyond just transfer learning or regularization? submitted by /u/Amun-Aion [link] [comments]  ( 9 min )
    How to properly implement Cover's Theorem in an SVM? [P]
    Maybe this belongs elsewhere since it's probably a dumb basic question, but basically I'm taking an undergrad course in AI and we've been given a classification problem. We were told as a "hint" to recall Cover's Theorem when separation fails, but the issue is she also wants us to draw a rough sketch of the data with the separator. Mine failed in a basic scatterplot so I upped the dimension by 1 but it also wasn't separable in R3 (which is annoying to draw anyway but could have been done), if I keep going then it might work at some point but idk how I'm meant to draw the data if it's separated in R4 or beyond. If it works in R4 do I just sketch the data in R3 and just draw a 3 dimensional point where w = 0? But even then if it goes beyond R4 it becomes way more annoying. So I'm assuming my implementation is just wrong, maybe the formula I used was wrong. Can someone show what a proper implementation looks like and how we're meant to up dimensions? Don't wanna post what I tried bc it has starter code and stuff baked into it which might allow my professor to find this post 😂 submitted by /u/Traditional_Land3933 [link] [comments]  ( 9 min )
    [D] Cross Entropy Classification vs Metric Learning + k-NN for image classification?
    Hi guys. We've all seen how hot RAG and vector DBs have been lately. How good are retrieval-based approaches for image classification? More concretely: Suppose we have a network trained with metric learning and a massive, diverse set of labelled examples to retrieve from. We've just been tasked to do classification with a fixed number of classes, and we've narrowed it down to two options: Embed our dataset using our metric learning network, throw the embeddings into a vector DB, and do k-NN Train a classifier via cross-entropy loss Which approach would we expect to provide better performance? What are the trade-offs? Any insight is appreciated! submitted by /u/supersmartypants [link] [comments]  ( 9 min )
    [D] Graph Neural Networks - Links Prediction Task on Directed, Heterogenous Multigraphs
    Hi guys, I have the following use case at hand for my thesis, and I'd like to ask for some help to formulate my problem: A directed multigraph (1 node type, multiple edge types) Each node and edge have their own attributes A set of graphs that are fully labeled. The dataset is self-created according to some technical rules. Training is supposed to be done on this dataset. My task is to perform link prediction in the inductive setting. This means that given an unseen incomplete graph at the inference time, the model should be able to predict all the missing links. I have read many papers and tried to formulate my problem in many directions. Since I am also new to GNNs, I would prioritize papers with an existing codebase and sound theoretical justifications for the techniques (which …  ( 10 min )
    Trouble improving accuracy in face recognition dataset [P]
    Hey everyone Im trying my hands with the The Labeled Faces in the Wild face recognition dataset, for a face recognition task. I have made a siamesemodel, and my loss curve is looking great but my accuracy stays at 0.500, for everything i have tried. Is there anybody in here that have tried their hands with this task before that can give me some tips to improve my accuracy. I am implementing it in python with PyTorch btw Thanks in advance! submitted by /u/Due_Concentrate1279 [link] [comments]  ( 9 min )
    How valuable is a PhD in science (with applied ML) compared to a PhD in only Machine learning [D]
    Is it more advantageous to pursue a PhD in machine learning with a focus on scientific applications for example (Machine learning for drug design) if the end goal is to work in the machine learning industry? Or is a general PhD in machine learning more valuable for this career path? Thank you submitted by /u/Neat-Print2792 [link] [comments]  ( 9 min )
    [R] 85% of the variance in language model performance is explained by a single factor (g, a unified measure of LLM ability)
    TL;DR and paper link are at the bottom of the post. I'm an undergrad who just wrote my first paper completely solo. Crazy experience with so many highs and lows, but I learned a lot from it. I think the results are important and I want people to see them, so I'll try to walk through the paper here as best as I can. I also have a small request for Arxiv enjoyers at the end. Given the nature of Reddit posts, I'll focus a bit less on the methods and more on the results. I won't cite stuff here either, but obviously you can find citations in the paper. First I'll give a small bit of historical context to what I'm doing, then walk through what I did and what came of it. Enjoy the read. The general intelligence factor in humans In the early 1900s, Charles Spearman observed that children's …  ( 14 min )
    [D] How to design API of Machine learning library
    In the past nine years of my deep learning journey, I have come across a vast number of frameworks. Lua Torch was a fantastic framework that initially died due to a lack of Python's ecosystem, but then rose again as PyTorch. Theano was also a great framework, but its major drawback was difficult debugging. I remember spending two weeks writing a Neural Turing Machine for solving bAbI tasks on theano. (Nowadays, it would take a couple hours on Pytorch). Tensorflow - I still don't understand what that was, a terrible framework. There was also Caffe, which was popular in computer vision. Julia is another language that attempted to introduce automatic differentiation as a built-in feature. And JAX, which I was originally biased against since it's a Google product. But some close friends persuaded me to try it, and I actually liked it. However, I thought that it would be difficult for JAX to gain widespread adoption in the community, as PyTorch already had a strong network effect and was gaining traction quickly. I didn't see how anyone could catch up with PyTorch. Another issue with JAX is that it requires additional cognitive load for developers. Take a look: https://higgsfield.substack.com/p/how-to-design-api-of-machine-learning submitted by /u/Good-Willingness-985 [link] [comments]  ( 9 min )
    [P] 2D Gaussian Splatting a great starting point for people who want to delve deeper
    Github : https://github.com/OutofAi/2D-Gaussian-Splatting https://i.redd.it/cwgsjtko1sub1.gif submitted by /u/TerryCrewsHasacrew [link] [comments]  ( 8 min )
    [D] How to Build Data Products? Deploy: Part 3/4 - Doubling down on the power of Unified Experiences for building state of the art models.
    Data products plays an important role in building state of the art machine learning models. Though their building process seems a bit confusing within industry as of now, this article series tries to simplify it by breaking it and explaining it into 4 steps. Take a look: https://moderndata101.substack.com/p/how-to-build-data-products-deploy What processes are being followed at your org for building scalable data products? submitted by /u/growth_man [link] [comments]  ( 9 min )
    Shared Public Contextual Database for RAG [D]
    Hey Guys, It seems RAG is really taking off as an increasingly popular use case for LLMs to leverage contextual data. However, everybody is building their own contextual data sets and embedding them in their own silo'd vector dbs. Do you guys think there's any utility in having a shared public vector db that anyone can tap into their API, without having to self-host, worry about the embedding pipelines and filling the vector db with enough data in the first place for their use cases? Would this save devs alot of time in quickly testing testing product ideas? (albeit it does seem that propriety data is what everyone's raving about today) - For context, I'm building a social media product we're users can upload a few pieces (approx 10) of content (social media posts, websites, videos to start with), which becomes the verified human-curated list/Niche. We then classify and embed this into a vector db. From this, we have set up a data pipeline to scrape the web and find new content that is most similar which we suggest to users to add to the Niche (upvote, downvote style). When a piece of content is upvoted on its added to the verified list updating the Niche's classification string. Essentially we're aiming to construct an ever-growing, user-curated, contextually classified vector database from a relatively small set of sample data. submitted by /u/niksteel123 [link] [comments]  ( 9 min )
    [D] Work regarding using LLMs to generate data for downstream tasks.
    Hi. I'm curious if there have been any studies done regarding the effects of using data generated by LLMs for other downstream tasks. The closest that I could find are the two papers: Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias (Yu et al., 2023) Generating Training Data with Language Models: Towards Zero-Shot Language Understanding (Meng et al., 2022) The former focuses on studying the differences between the type of prompts that are used to generate the data and the latter doesn't use LLMs. Doesn't have to be papers, blog posts or any sort of information regarding the scenario I described is fine. Thanks. submitted by /u/Seankala [link] [comments]  ( 9 min )
    [D] Embedding models in production(CPU w/ high throughput)
    Hello, I am working on an app that requires creating lots of text embeddings(100M tokens). Looking at OpenAI Ada pricing(and considering that my app doesn't yet make any money) I'm looking into self-hosting a model to run on CPU. I know that constrains me towards smaller models-- so far locally I've been testing with sentence-transformers/all-MiniLM-L6-v2 and the query results seem okay-ish enough for my MVP. (Although, I should not that I haven't compared how embeddings with other models would perform.) Does anyone have experiences doing something similar? In particular, I'd love to hear about any tips you have for maximizing no. of embeddings / second. (new to ML/MLOps, so apologies if this is a silly question :) submitted by /u/rsamrat [link] [comments]  ( 9 min )
    [N] Introducing Stable Fast: An ultra lightweight inference optimization library for HuggingFace Diffusers on NVIDIA GPUs
    What is this? stable-fast is an ultra lightweight inference optimization library for HuggingFace Diffusers on NVIDIA GPUs. stable-fast provides super fast inference optimization by utilizing some key techniques and features: CUDNN Convolution Fusion: stable-fast implements a series of fully-functional and fully-compatible CUDNN convolution fusion operators for all kinds of combinations of Conv + Bias + Add + Act computation patterns. Low Precision & Fused GEMM: stable-fast implements a series of fused GEMM operators that compute with fp16 precision, which is fast than PyTorch's defaults (read & write with fp16 while compute with fp32). NHWC & Fused GroupNorm: stable-fast implements a highly optimized fused NHWC GroupNorm + GELU operator with OpenAI's triton, which eliminates the need…  ( 9 min )
    [R] Does the Flan T5 decoder take the question as input ?
    Hello, I was looking at the Flan T5 paper and code. It was clear that the question (instruction) and the context are given to the encoder as input. But I find no details on what does the decoder take as input apart from the fact that it starts with the pad token. Anyone can give me more details please ? Thanks ! submitted by /u/Meddhouib10 [link] [comments]  ( 9 min )
    [D] TensorFlow.js and state of the ecosystem for JavaScript
    I am curious about the state of the ecosystem for JavaScript, where TF looks like a reasonably solid option. Options I have found so far are: TensorFlow.js (looks like the most complete solution, but the general sentiment about TF in Python is pretty bad!) MediaPipe (to quickly implement specific use cases it seems, maybe using tf.js in the background?) ml5.js (a layer on top of tf.js to make it more approachable if i understand correctly) transformers.js (haven't quite grasped this one) shumai (bun only, so server side only) I am curious to read informed opinions about these and more! Have you used them and how? ​ submitted by /u/gtnbssn [link] [comments]  ( 9 min )
    [D] Which raw OPS/s benchmarks best reflect ML/DL workloads?
    Hi all. I'm preparing an open website about products used for AI/ML/DL computation (no in-house testing for now, just the database and GUI). However comparing raw speed of products of different vendors is more challenging than I anticipated, because there are many possible raw performance indicators, only few of which are provided by vendors. For example a raw performance indicator can be "FP32 vector with opportunistic optimization", while another can be "BF16 matrix/tensor without opportunistic optimization". A full picture of raw performance would be fully represented only by a table with multiple dimensions: Number format (FP64, FP32, TF32, FP16, BF16, FP8, INT8, INT4... are the others?) Vector vs. matrix/tensor operation (boolean) Opportunistic optimizations like Nvidia Sparsity…  ( 9 min )
    [D] Interesting loss graphs
    Wondering if anyone has some interesting loss graphs that they could share. Maybe loss suddenly dropped after 100 epochs, or a local minima was found and then it jumped into a lower one. Wondering if anyone forgot to turn off training and cam back to an improved result than what they thought had already been converged to. submitted by /u/HStuart18 [link] [comments]
  • Open

    Thoughts on new ChatGPT features
    I've had access to Dall-3, Vision and voice chat features, and I've been blown away by how impressive each of the new features are. Dall-E 3 seems roughly comparable to Midjourney in overall image quality, but does a much better job at understanding the prompt. The vision model continues to surprise by how well it is able to understand images at a seemingly human level of comprehension. And the voice chat is such an intuitive and captivating way of interacting with ChatGPT, it felt like I was interacting with one of the AI assistants from the movie "Her". However, it's unfortunate that these amazing new features cannot be used together at the same time. Up until gaining access to these features, I had been using the advanced data analysis model as my default, which is great for helping with programming tasks. I can only imagine how revolutionary ChatGPT will be when a cohesive multi-modal model is released sometime in the near future which has all these capabilities available from the start. What things would you want to try if such a cohesive model was released? I can already imagine some use cases where you could set up iterative improvement for things like interface design, which some people have already got to work with just the base vision model by itself. submitted by /u/ImRealNow [link] [comments]
    U.S. Tightens China's Access to Advanced Chips for Artificial Intelligence
    The Biden administration has announced additional limits on sales of advanced semiconductors by American firms to China, in an effort to restrict China's progress on supercomputing and artificial intelligence. The new rules will likely halt most shipments of advanced semiconductors from the United States to Chinese data centers, which use them to produce models capable of artificial intelligence. Chip makers seeking to sell China advanced chips or the machinery used to make them will be required to notify the government of their plans or obtain a special license. To prevent the risk of advanced U.S. chips reaching China through third countries, chip makers will also need licenses to ship to other countries subject to U.S. arms embargoes. The Biden administration argues that China's access to advanced technology is dangerous as it could aid the country's military in tasks like guiding hypersonic missiles or cracking top-secret U.S. codes. The restrictions may affect Chinese companies developing AI chatbots and could weaken China's economy in the long run, as AI is transforming industries from retail to healthcare. The limits are also expected to impact sales to China of U.S. chip makers such as Nvidia, AMD, and Intel, who earn a significant portion of their revenue from Chinese buyers. The rules will exempt chips used in commercial applications like smartphones, laptops, electric vehicles, and gaming systems. The Semiconductor Industry Association, which represents major chip makers, is evaluating the impact of the updated rules. The Biden administration has been trying to counter China's growing mastery of cutting-edge technologies by investing in new chip factories in the U.S. while setting restrictions on exports of technology to China. Source : https://www.nytimes.com/2023/10/17/business/economy/ai-chips-china-restrictions.html submitted by /u/NuseAI [link] [comments]
    Google: Data-scraping lawsuit would take 'sledgehammer' to generative AI
    Google has asked a California federal court to dismiss a proposed class action lawsuit that claims the company's scraping of data to train generative artificial-intelligence systems violates millions of people's privacy and property rights. Google argues that the use of public data is necessary to train systems like its chatbot Bard and that the lawsuit would 'take a sledgehammer not just to Google's services but to the very idea of generative AI.' The lawsuit is one of several recent complaints over tech companies' alleged misuse of content without permission for AI training. Google general counsel Halimah DeLaine Prado said in a statement that the lawsuit was 'baseless' and that U.S. law 'supports using public information to create new beneficial uses.' Google also said its alleged use of J.L.'s book was protected by the fair use doctrine of copyright law. Source : https://www.reuters.com/legal/litigation/google-says-data-scraping-lawsuit-would-take-sledgehammer-generative-ai-2023-10-17/ submitted by /u/NuseAI [link] [comments]
    [AI Dad Joke] Why did the AI stop being nice?
    It regressed to mean... PS: I read the sidebar which didn't exclude humor, and the flair seems to suggest that it would be okay, but my apologies if not. submitted by /u/Tyler_Zoro [link] [comments]
    👨🏻‍🏫 Generative AI Security Standards, LLM‘s 200K Context Window, Alibaba's Open-Source Obsession, and Baidu World 2023
    submitted by /u/trcytony [link] [comments]
    Can GPT models be financial analysts? ChatGPT, GPT-4 fail CFA exams in new study by JP Morgan, Queens University, and Virginia Tech
    Researchers evaluated ChatGPT and GPT-4 on mock CFA exam questions to see if they could pass the real tests. The CFA exams rigorously test practical finance knowledge and are known for being quite difficult. They tested the models in zero-shot, few-shot, and chain-of-thought prompting settings on mock Level I and Level II exams. The key findings: GPT-4 consistently beat ChatGPT, but both models struggled way more on the more advanced Level II questions. Few-shot prompting helped ChatGPT slightly Chain-of-thought prompting exposed knowledge gaps rather than helping much. Based on estimated passing scores, only GPT-4 with few-shot prompting could potentially pass the exams. The models definitely aren't ready to become charterholders yet. Their difficulties with tricky questions and core finance concepts highlight the need for more specialized training and knowledge. But GPT-4 did better overall, and few-shot prompting shows their ability to improve. So with targeted practice on finance formulas and reasoning, we could maybe see step-wise improvements. TLDR: Tested on mock CFA exams, ChatGPT and GPT-4 struggle with the complex finance concepts and fail. With few-shot prompting, GPT-4 performance reaches the boundary between passing and failing but doesn't clearly pass. Full summary here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]
    Let's find out what GPT4 vision can do
    GPT4 vision isn't just a gimmick. We've been given a new superpower, and so we must "deal with it". This is probably as big a moment as when chatGPT first arrived, maybe more. Machine Vision for the masses (and more). I tried doing some very loose sketches, and it really struggled to identify them until they were coloured in. Humans could easily what they were. But, in order to see what uses it has, we need to know what capabilities it does and does not have. Pick a question and see what you can learn! can it use TINY images (I assume they are much faster) can it tell you what has changed in two images? can it measure distances ? (with perspective?) can it make 3d models from instructions? can it "learn" to recognise people/ similar objects (in the same context window) what limits are there to exhaustive listing exhaustive description is it better at details or overviews can it read maps / graphs / text how smart is it on DIY / xrays / mechanics can it follow wires?? (Can it find lego) is there a formal reference system you can use (X/Y) can it give co-ordinates in large grids or grid-like (how un-grid like) ie film strip, or window-panes can it navigate a 2d maze turn-by turn? 3d maze? can that be insanely complex? can it make ebay descriptions (condition) can it estimate food weight can it estimate strength / angles / volume can it create programs from screenshots. Can it use programs? games? control RC car / robot? what kind of language / instructions are best when talking about images. what other questions do we need submitted by /u/inteblio [link] [comments]
    AI pioneers LeCun, Bengio clash in intense online AI safety, governance debate
    Yann LeCun and Yoshua Bengio, two influential figures in AI and deep learning, engaged in a heated debate over the potential risks and safety concerns surrounding AI. LeCun emphasized the need to design AI systems for safety rather than imagining catastrophic scenarios. Bengio argued for the importance of prudence, stating that we still do not understand how to design safe, powerful AI systems, and highlighted the need for major investment in AI safety and governance. The debate highlighted the disagreement among esteemed researchers about AI's potential risks, the effectiveness of current safety measures, and the best path forward. The implications of AI, including job displacement, privacy violations, and existential risks, have become a topic of widespread concern. Source : https://venturebeat.com/ai/ai-pioneers-yann-lecun-and-yoshua-bengio-clash-in-an-intense-online-debate-over-ai-safety-and-governance/ submitted by /u/NuseAI [link] [comments]
  • Open

    Learn how Amazon Pharmacy created their LLM-based chat-bot using Amazon SageMaker
    Amazon Pharmacy is a full-service pharmacy on Amazon.com that offers transparent pricing, clinical and customer support, and free delivery right to your door. Customer care agents play a crucial role in quickly and accurately retrieving information related to pharmacy information, including prescription clarifications and transfer status, order and dispensing details, and patient profile information, in […]  ( 8 min )
    Keeping an eye on your cattle using AI technology
    At Amazon Web Services (AWS), not only are we passionate about providing customers with a variety of comprehensive technical solutions, but we’re also keen on deeply understanding our customers’ business processes. We adopt a third-party perspective and objective judgment to help customers sort out their value propositions, collect pain points, propose appropriate solutions, and create […]  ( 16 min )
    Personalize your search results with Amazon Personalize and Amazon OpenSearch Service integration
    Amazon Personalize has launched a new integration with Amazon OpenSearch Service that enables you to personalize search results for each user and assists in predicting their search needs. The Amazon Personalize Search Ranking plugin within OpenSearch Service allows you to improve the end-user engagement and conversion from your website and app search by taking advantage […]  ( 7 min )
  • Open

    DSC Weekly 17 October 2023
    Announcements Top Stories In-Depth The post DSC Weekly 17 October 2023 appeared first on Data Science Central.  ( 20 min )
    Uncharted digital landscapes and the quest for timeless identity
    In a recent podcast episode, Lex Freedman and Mark Zuckerberg convened in the Metaverse, where the digital realm intertwines with reality. Their astonishingly realistic interaction, while highlighting technological advancements, also prompted deeper contemplations. As the line between digital recreations and reality becomes increasingly blurred, it beckons questions about the definitions of identity and consciousness and… Read More »Uncharted digital landscapes and the quest for timeless identity The post Uncharted digital landscapes and the quest for timeless identity appeared first on Data Science Central.  ( 22 min )
    Internet Of Things (IOT):  Application In Hazardous Locations
    Introduction to Internet of Things (IOT): Internet of Things (IoT) represents the fourth-generation technology that facilitates the connection and transformation of products into smart, intelligent and communicative entities. IoT has already established its footprint in various business verticals such as medical, heath care, automobile, and industrial applications. IoT empowers the collection, analysis, and transmission of… Read More »Internet Of Things (IOT):  Application In Hazardous Locations The post Internet Of Things (IOT):  Application In Hazardous Locations appeared first on Data Science Central.  ( 23 min )
    The digital evolution in aviation: how big data and analytics are transforming the industry
    Long before passengers sit back, relax, and enjoy their flight, data has played a critical role in getting them to their seats. It has been a cornerstone of the aviation industry since the early days of air travel. Indeed, from the early 20th century, data was collected through manual processes such as pilots logging information… Read More »The digital evolution in aviation: how big data and analytics are transforming the industry The post The digital evolution in aviation: how big data and analytics are transforming the industry appeared first on Data Science Central.  ( 20 min )
  • Open

    "STARC: A General Framework For Quantifying Differences Between Reward Functions", Skalse et al 2023
    submitted by /u/gwern [link] [comments]
    "Goodhart's Law in Reinforcement Learning", Karwoski et al 2023
    submitted by /u/gwern [link] [comments]
    Dynamic state and action space
    Hello, I’m working on a scenario that involves many systems and each system involves many subsystems. At each decision time and according to the system that requests the decision, the RL agent must select a subsystem. Nevertheless, each system has a different number of subsystems which makes the action space and the state space dynamic since the each neurone in the output represents a subsystem. Can I use the maximal number of subsystems (not the total number) as the number of the output and masking some neurones according to the current system ? submitted by /u/GuavaAgreeable208 [link] [comments]
    Offline rl- interpreting policy
    I am new to RL and have a naive question. How interpretable would the policy be from building a rl algorithm in an offline setting? Could I make inferences about what the optimal sequences would be? submitted by /u/kwsunshine123 [link] [comments]
  • Open

    Goal Representations for Instruction Following
    Goal Representations for Instruction Following Figure title. Figure caption. This image is centered and set to 50% page width. --> A longstanding goal of the field of robot learning has been to create generalist agents that can perform tasks for humans. Natural language has the potential to be an easy-to-use interface for humans to specify arbitrary tasks, but it is difficult to train robots to follow language instructions. Approaches like language-conditioned behavioral cloning (LCBC) train policies to directly imitate expert actions conditioned on language, but require humans to annotate all training trajectories and generalize poorly across scenes and behaviors. Meanwhile, recent goal-conditioned approaches perform much better at general manipulation tasks, but do not enable easy t…  ( 7 min )
  • Open

    Striking Performance: Large Language Models up to 4x Faster on RTX With TensorRT-LLM for Windows
    GeForce RTX and NVIDIA RTX GPUs, which are packed with dedicated AI processors called Tensor Cores, are bringing the power of generative AI natively to more than 100 million Windows PCs and workstations.  ( 7 min )
    NVIDIA RTX Video Super Resolution Update Enhances Video Quality, Detail Preservation and Expands to GeForce RTX 20 Series GPUs
    NVIDIA today announced an update to RTX Video Super Resolution (VSR) that delivers greater overall graphical fidelity with preserved details, upscaling for native videos and support for GeForce RTX 20 Series GPUs.  ( 7 min )
  • Open

    New technique helps robots pack objects into a tight space
    Researchers coaxed a family of generative AI models to work together to solve multistep robot manipulation problems.  ( 11 min )
  • Open

    Model metamers reveal divergent invariances between biological and artificial neural networks
    submitted by /u/Chipdoc [link] [comments]  ( 8 min )

  • Open

    Article: Key Concepts and Open Questions in a Golden Age for Natural Language Understanding
    submitted by /u/Stanford_Online [link] [comments]
    DexCatch: Learning to Catch Arbitrary Objects with Dexterous Hands
    🌟 Excited to share our recent research, DexCatch! Pick-and-place is slow and boring, while throw-catching is a behaviour towards more human-like manipulation. We propose a new model-free framework that can catch diverse objects of daily life with dexterous hands in the air. This ability to catch anything from a cup to a banana, and a pen, can help the hand quickly manipulate objects without transporting objects to their destination -- and even generalize to unseen objects. Video demonstrations of learned behaviors and the code can be found at https://dexcatch.github.io/. ​ https://reddit.com/link/17973ri/video/i4xdo39d4lub1/player submitted by /u/Shengjie_Wang [link] [comments]
    Help with Model Based Policy Optimization
    I am reading this paper and came across the following paragraph - ​ "Model usage. Many recent model-based algorithms have focused on the setting in which model rollouts begin from the initial state distribution (Kurutach et al., 2018; Clavera et al., 2018). While this may be a more faithful interpretation of Algorithm 1, as it is optimizing a policy purely under the state distribution of the model, this approach entangles the model rollout length with the task horizon. Because compounding model errors make extended rollouts difficult, these works evaluate on truncated versions of benchmarks. The branching strategy described in Section 4.2, in which model rollouts begin from the state distribution of a different policy under the true environment dynamics, effectively relieves this limitation. In practice, branching replaces few long rollouts from the initial state distribution with many short rollouts starting from replay buffer states." ​ What does state distribtion mean over here? Also in line 8 of the image, I don't understand what's the relation between model rollout and policy \pi_t. Is it saying, use the model free algorithm to take future steps from that state? What does the model have to do with that? ​ https://preview.redd.it/twlej5my3kub1.png?width=1182&format=png&auto=webp&s=4a515c8d237c963052bc1b60a9e7dda53a33f001 submitted by /u/Academic-Rent7800 [link] [comments]
    math prerequisites for reinforcement learning research?
    hi all! i’m an undergraduate that is really interested in pursuing a PhD. i think reinforcement learning is especially interesting, causal reinforcement learning in particular. for my current research job, which unfortunately doesn’t really involve ML, i read a little about causal inference and it really intrigued me. what mathematics courses should i take to get into RL research at a theoretical/algorithmic level? i am currently taking proof-based linear algebra, and have taken all the computational calculus offered. i imagine prob. theory/math stats is pretty important, too; what else? submitted by /u/treeman0469 [link] [comments]
  • Open

    [D] Exploring Methods to Improve Text Chunking in RAG Models (and other things...)
    Hello everyone, I'm currently working on Retrieval Augmented Generation (RAG) models and have developed a custom chunking function, as I found the methods in LangChain not entirely satisfactory. I'm keen on exploring other methods, algorithms (related to NLP or otherwise), and models to enhance text chunking in RAG. There are many RAG implementations out there, but I've noticed a lack of focus on improving chunking performance specifically. Are there any other promising approaches beyond my current pipeline, which consists of a bi-encoder (retriever), cross-encoder (reranker), and a Large Language Model (LLM) for interactions? For queries, I'm using both traditional and HyDE (Hypothetical Document Embedding) approaches in the retrieval phase, and sending the top 'n' results of both similarity search to the reranker. I've also tried using an LLM to convert the query into a series of 10-20 small phrases or keywords, which are then used as the query for the retriever model. However, the results vary depending on the LLM used. To generate good keywords (with a not extractive approach) , I had to use a "CoT" prompt, instructing the model to write self-instruct, problem analysis and reasonings before generating the required keywords. But this approach use lots of tokens, and requires careful scraping to ensure the model has used the right delimiter to separate reasoning and the actual answer. I'm also planning to modify the text used to generate embeddings, while returning the original text after the recall phase. But this is still a work in progress and scaling it is proving to be a challenge. If anyone has any tips or experience with this, I'd appreciate your input. I'd be grateful for any resources, repositories, libraries, or existing implementations of novel chunking methods that you could share. Or we could just discuss ideas, thoughts, or approaches to improve text chunking for RAG here. Thanks in advance for your time! submitted by /u/BXresearch [link] [comments]  ( 9 min )
    [D] Rate my GPU server for Deep Learning
    I started learning deep learning last year and decided to step up my game with regard to model training and tools. I recently built a GPU server. It’s still within its return period, so please help decide if it’s worth keeping: Processor: 2x Xeon E5-2690 v4 2.6GHz 14-Core Memory: 128GB GPU: 8x NVIDIA Tesla P100 16GB HBM2 Accelerator Card Total cost: ~$3200 submitted by /u/Stonks-Stocks [link] [comments]  ( 9 min )
    [N] "How to Apply to Grad School" webinars by CMU RI!
    We are hosting a few "How to Apply to Grad School" webinars this week. This is a chance to hear from faculty and students in the Robotics Institute at CMU on what life in grad school is actually like, as well as get some tips on crafting a strong application! https://cmu-ri-resources.github.io/ submitted by /u/bart-ai [link] [comments]  ( 9 min )
    [R] Microsoft presents Table-GPT: Table-tuned GPT for Diverse Table Tasks
    Tables pack tons of relational data but are tough for AI to grasp. They have complex 2D structure with information scattered across rows and columns. Models like GPT-3 fail basic tasks like finding where a missing value should go. LLMs struggle at this because they're pre-trained mostly on natural text, which is linear. Researchers at Microsoft wanted to mitigate this with "table-tuned" models, trained on table-related tasks. Their process: Automatically generate lots of diverse table-task training cases from a corpus of real-world tables. Ex: "impute missing value" or "identify error in table". Further augment data via paraphrasing, shuffling table rows/columns, chaining model responses, etc. This table-tuning produced "Table-GPT" models with substantially stronger table skills. In experiments, Table-GPT crushed vanilla GPT-3: 25%+ better on unseen table tasks like missing value ID and column type ID Beat GPT-3 on 98% of test cases across 9 different table tasks Stayed superior after downstream tuning too There's tons more work to do but seems pretty promising. Table-tuning boosted models' ability to comprehend tables and reason over tabular data vs just pre-training on text. TLDR: Training AI models more on synthesized table tasks ("table-tuning") significantly improves their table skills. Full summary is here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Text-to-pose?
    When are we getting a text-to-pose ai? I'd love to be able to generate poses for 3d models that match a given text description, because sometimes what my mind comes up with doesn't feel adequate. It's frustrating that I'm not seeing any developments in this area of ai, and I lack the skills to commence the developments myself. submitted by /u/BM09 [link] [comments]  ( 9 min )
    [R] Google Pali-3 Vision Language Models: Contrastive Training Outperforms Classification
    submitted by /u/currentscurrents [link] [comments]  ( 8 min )
    [D] Sources of esoteric data? Specifically looking for 6dof motion data from a medium to large oceangoing vessel underway in various sea states.
    I am ok with paying for the data. I just can't find any sources for it. I found some data on github that appears to come from container ships at port, but nothing for a ship underway. submitted by /u/jschall2 [link] [comments]  ( 9 min )
    [D] Adding a modality to a pre-trained model
    Hi, I have a dataset with video and other modalities (e.g. audio), and I want to run a captioning task. I found UniVL, which is a pre-trained model that supports video and text (transcripts) and can caption them. It extracts features and runs transformer encoders on both these modalities to get an embedding, then concatenates them and feeds it into a cross-encoder and decoder to get captions. I'm wondering if I can make use of this model, but add in other modalities, by writing my own embedding model and feeding the embeddings into the cross encoder. Would this work? Is there any similar previous work regarding adding new modalities to a pre-trained network? submitted by /u/joeswansonx69x [link] [comments]  ( 9 min )
    [D] What is the current SOTA of Neural Architecture Search (NAS)?
    I've seen classic papers before 2021 that have been quite influential - RL and evolution based strategies. I have also seen: differentiable approaches: https://arxiv.org/abs/1806.09055 zero-learning approaches: https://arxiv.org/abs/2006.04647 But these are all papers pre-2021. From people who are familiar with this field, what is the current SOTA of neural architecture search (NAS) post 2022? i.e. papers that can serve as the most relevant baselines? Thank you! :) ​ ​ ​ submitted by /u/Cultural-Average3959 [link] [comments]  ( 9 min )
    [D] Is active learning a dying field in industry, given the development in few shot/zero shot learning?
    Is active learning a dying topic when zero shot learning came out? Active learning is to used few labeled samples plus a initially trained model to select the most useful unlabeled data for training. Zero/few shot learning is to train a model on some data then Mae it work directly with unseen label/data. In my understanding, zero/few short learning is more aligned with the current large model trend or foundation model trend. Active learning strategy seems to still rely on small dataset and was intending to gradually enrich training data by selecting new samples in. In industry and in big tech, which one is more used or deployed? Anyone can give me some comments? submitted by /u/Little-Bumblebee-452 [link] [comments]  ( 9 min )
    [R] Decoding LLM Uncertainties for Better Predictability
    Hi all, Building off our last research post, we wanted to figure out ways to quantify "ambiguity" and "uncertainty" in prompts/responses to LLMs. We ended up discovering two useful forms of uncertainty: "Structural" and "Conceptual" uncertainty. In a nutshell: Conceptual uncertainty is when the model isn't sure what to say, and Structural uncertainty is when the model isn't sure how to say it. You can play around with this yourself in the demo or read about it in more detail in the blog post submitted by /u/shayanjm [link] [comments]  ( 9 min )
    [D] For large datasets, is your data selection process limiting model performance?
    I often hear from folks with very large datasets saying: “my labelling costs keep increasing, but we don’t see model performance improvements” or “my storage and compute costs are rising (for a dataset of 1M+ images) but performance just stalled”. This post argues that large datasets have hidden costs, beyond time and money, poor data quality and the wrong selection process might be killing model performance. Any thoughts? Have you faced this challenge? submitted by /u/btcmx [link] [comments]  ( 9 min )
    [P] SemanticSearch for PDF mining
    Hello, everyone! I'm seeking tips to enhance my semantic search pipeline. Currently, I'm working on a semantic search tool. Given a set of text files, my goal is to retrieve the most relevant information related to the query. To achieve this, I begin by preprocessing the PDF files, splitting them into pages, and computing embeddings using a fine-tuned BERT model for Italian. Next, with a query and its embedding, I calculate the cosine similarity to all the pages in the document. Since there aren't many pages, a brute search remains quite fast. However, I'm encountering an issue where the similarity results don't consistently yield the most relevant information. I've experimented with various embedding layers, but there's been little to no improvement. I've also tested a commercially available solution to ensure the problem isn't with my PDF files. Interestingly, I achieved better results, leading me to believe that the issue may lie within my pipeline. My current hypothesis is that the page splitting process might be excluding relevant semantic connections, and I may need to improve my text preprocessing. What suggestions do you have to enhance my results? P.S. The information obtained from the similarity check is subsequently used as context with a chat language model, similar to tools like AsMyPdf. submitted by /u/AcquaFisc [link] [comments]  ( 9 min )
    [D] Good compression algo to compress model checkpoints?
    I have a couple of terabytes of checkpoints, and I desperately need to free up some space, without deleting those atm. Is there a compression algorithm that can handle such data successfully? I tried gzip with tar but the compressed size ended up being only ~100G less - that's when I realized that (gzip) compression algo is not good at handling seemingly random numerical data. Do you know of methods that've proven to work in this scenario? submitted by /u/OpeningVariable [link] [comments]  ( 9 min )
    [R] Think before you speak: Training Language Models With Pause Tokens
    https://arxiv.org/pdf/2310.02226.pdf Abstract Language models generate responses by producing a series of tokens in immediate succession: the (K+1)th token is an outcome of manipulating K hidden vectors per layer, one vector per preceding token. What if instead we were to let the model manipulate say, K+10 hidden vectors, before it outputs the (K+1)th token? We operationalize this idea by performing training and inference on language models with a (learnable) pause token, a sequence of which is appended to the input prefix. We then delay extracting the model's outputs until the last pause token is seen, thereby allowing the model to process extra computation before committing to an answer. We empirically evaluate pause-training on decoder-only models of 1B and 130M parameters with causal pretraining on C4, and on downstream tasks covering reasoning, question-answering, general understanding and fact recall. Our main finding is that inference-time delays show gains when the model is both pre-trained and finetuned with delays. For the 1B model, we witness gains on 8 of 9 tasks, most prominently, a gain of 18% EM score on the QA task of SQuAD, 8% on CommonSenseQA and 1% accuracy on the reasoning task of GSM8k. Our work raises a range of conceptual and practical future research questions on making delayed next-token prediction a widely applicable new paradigm. Here is a Medium post about my thoughts on the paper. submitted by /u/transformer_ML [link] [comments]  ( 9 min )
    [D] Can Direct Preference Optimization (DPO) be used to replace any type of RL for LLMs, or is it better suited for just scenarios like RLHF?
    DPO Paper I read a really fascinating paper where RL was used on LLMs to make them better at interacting in embodied environments. https://arxiv.org/abs/2310.08588 The technique was called Reinforcement Learning with Environmental Feedback (RLEF). In the paper PPO was used, but I'm wondering if DPO could be used to replace it? submitted by /u/30299578815310 [link] [comments]  ( 9 min )
    How to create dataset for training generative chatbot model? [D]
    i built my own custom generative ai chatbot model. only thing i need is high quality and diverse dataset to train my model. i cant use already existing datasets because i dont think they are diverse and quality enough.so i need to create it using gpt4. my dataset will have 3 columns ; system_prompt, input, output. but im not very experienced on creating datasets, and i couldnt find any resources about this. all input ,output and system prompt all should be created by gpt4. how can i do it? and what is most effective way to use api for this? submitted by /u/Many-Corner-6700 [link] [comments]  ( 9 min )
    [P] MergeLlama-7b - A fine tune of CodeLlama for resolving merge conflicts
    Merge conflicts are something that give developers hours of headaches and I figured I would try and give my take on a solution. I followed a paper from IEEE engineers in 2022 who trained CodeBert on merge conflicts as a classification task, and they published their dataset for public use. Input formatted as “>>>>>>” will output the attempted conflict resolution. I am still trying to find out how to do evaluations on this model as the loss applies to all sections not just the resolution, and the TRL Trainer with a data collator gives NaN as a loss. The model and dataset are on HuggingFace under codys12/MergeLlama and codys12/MergeLlama-7b. Any feedback is appreciated! submitted by /u/cstein123 [link] [comments]  ( 9 min )
    [P] OpenLLMetry, a way to get complete visibility into RAG pipelines with your existing tools
    Hey, I've built a set of extensions for OpenTelemetry that provides visibility into LLM applications like RAG pipelines - whether it be prompts, vector DBs and more. Here’s the repo: https://github.com/traceloop/openllmetry. Two key benefits with OpenTelemetry are - You can trace your entire system execution, not just the LLM (so you can see how requests to DBs, or other calls affect the overall result); You can connect to any monitoring platform—no need to adopt new tools. Install the SDK and plug it into Datadog, Sentry, or both. Or switch between them easily. There's already support for OpenAI, Anthropic, Cohere, Pinecone, Chroma, LangChain, and Haystack and we are working hard to support the entire ecosystem. Would love to hear your thoughts submitted by /u/nirga [link] [comments]  ( 9 min )
    Can AI Replace Developers? Princeton and University of Chicago's SWE-bench Tests AI on Real Coding Issues [N]
    Exploiting AI to make software programming easier? SWE-bench, a unique evaluation system, tests language models' ability to solve real GitHub-collated programming issues. Interestingly, even top-notch models manage only the simplest problems, underscoring tech development's urgency for providing practical software engineering solutions. For the latest advancements in AI, look here first. https://preview.redd.it/rq5vl22bckub1.png?width=1292&format=png&auto=webp&s=d79988bfe0ab37b0f97f55296d7a7341c9292c11 A New Approach to Evaluating AI Models Researchers use real-world software engineering problems from GitHub to assess language models' coding problem-solving skills. SWE-bench, introduced by Princeton and the University of Chicago, offers a more comprehensive and challenging benchmark…  ( 9 min )
    How to design a Chat-GPT or Bard-like large scale app with your own foundational model? [D]
    I am just puzzled how does one efficiently query a huge transformer model such that so many users can be served at the same time. Is it queried on per user basis? (modulo some caching) If yes, how expensive is this? If no, what the hell is going on? :D Are there any good resources on this? (how to build large scale apps with big models, from scratch). Somehow this doesn't really fit the standard data-intensive system design process, or maybe I am missing something. submitted by /u/jimmymvp [link] [comments]  ( 9 min )
  • Open

    Taken on my screen, but I can’t get over what it has become. I’m obsessed with AI.
    submitted by /u/Prestigious_Rough704 [link] [comments]
    I built an AI tool to help authors create webcomics
    I always did want to draw a comic but I was never very good at drawing even though I put a lot of effort into it when I was younger... :'( So when I stumbled on image generation AI, I thought maybe it could help me transform my doodles into something decent. It took me a while and a lot of effort to write a tool to help me with that : story and dialogues are my own, images are based on doodles enhanced by AI. I would love to have feedback about the story : https://stripik.com/story/4/chapter/4/ ​ https://preview.redd.it/dvcudd4j3mub1.png?width=800&format=png&auto=webp&s=717bef60eaaf9b9a35a1a66f266c374406a923fa submitted by /u/maxcmoi [link] [comments]
    I'm chronicling the process of trying to create a boardgame with Chat GPT and it's amazing just how great of an assistant it is!
    submitted by /u/SexyJimBelushi [link] [comments]
    If SEO tools were Nintendo 3DS games [Powered by AI]
    Did you play these (SEO) games? 👾 https://preview.redd.it/yxuzllzupkub1.jpg?width=661&format=pjpg&auto=webp&s=23ebc6e972ac85b152aa8b69f48e2b0c5bae2c76 https://preview.redd.it/x8zfokzupkub1.jpg?width=661&format=pjpg&auto=webp&s=be2163a7bfbeee64a63c1292a5b4c482c5be33ae https://preview.redd.it/eerpgnzupkub1.jpg?width=661&format=pjpg&auto=webp&s=d8eceafd3732653c743a6731ae5932c9e0da071c https://preview.redd.it/uxwgskzupkub1.jpg?width=661&format=pjpg&auto=webp&s=07c751eaa16f8fa484034c98a3c1fd0b2162f5a2 Source: https://twitter.com/carlos_darko/status/1713900305765605484 submitted by /u/DanielPeris [link] [comments]
    Can AI Replace Developers? Princeton and University of Chicago's SWE-bench Tests AI on Real Coding Issues
    Exploiting AI to make software programming easier? SWE-bench, a unique evaluation system, tests language models' ability to solve real GitHub-collated programming issues. Interestingly, even top-notch models manage only the simplest problems, underscoring tech development's urgency for providing practical software engineering solutions. For the latest advancements in AI, look here first. https://preview.redd.it/8laeg7cbckub1.png?width=1292&format=png&auto=webp&s=e549f0045a7253cd2d3f351d8297a301c4cbf6ac A New Approach to Evaluating AI Models Researchers use real-world software engineering problems from GitHub to assess language models' coding problem-solving skills. SWE-bench, introduced by Princeton and the University of Chicago, offers a more comprehensive and challenging benchmark…
    Deep fake language change
    What is the best free tool to make a video where the language changes? submitted by /u/Easy_Technology6768 [link] [comments]
    One-Minute Daily AI News 10/15/2023
    New York-based tech firms and investors see the advent of AI as the latest opportunity to try to unseat the Bay Area as tech’s global capital.[1] Microsoft announced a new “bug bounty” program, vowing to reward security researchers between $2,000 and $15,000 if they’re able to find “vulnerabilities” in its Bing AI products, including “jailbreak” prompts that make it produce responses that go against the guardrails that are supposed to bar it from being bigoted or otherwise problematic.[2] OpenAI is preparing to launch a suite of updates to make it more cost-effective and efficient for developers to create software applications with AI models.[3] TCS Seeks to Use Microsoft AI Partnership to Improve Margins.[4] Sources: [1] https://www.axios.com/2023/10/12/new-york-ai-world-capital [2] https://futurism.com/the-byte/microsoft-bing-ai-bug-bounty [3] https://www.techedt.com/openai-aims-to-attract-developers-with-cost-effective-updates-insiders-reveal [4] https://www.bloomberg.com/news/articles/2023-10-15/tcs-seeks-to-use-microsoft-ai-partnership-to-improve-margins#xj4y7vzkg submitted by /u/Excellent-Target-847 [link] [comments]
    Are there an image generators that can generate the same image you upload to it, but from a different hypothetical angle?
    I was wondering if any AI image generation was good at this (yet?). I have a real-life image I want to upload and get AI to generate what that would most likely look like from the vantage point of someone standing at a different angle. submitted by /u/YepperyYepstein [link] [comments]
    AI dubbing ( local )
    Hi there, anybody knows how AI dubbing translator works ? As im interested if something similiar to https://app.rask.ai/ exist localy ?? Is there anything from github? Im looking for czech language. I know you can scribe audio to text than translate text and let AI to talk this text. But is there a tool that do all of this in one click ? Thank you and have a nice day. submitted by /u/Low_Government_681 [link] [comments]
  • Open

    A method to interpret AI might not be so interpretable after all
    Some researchers see formal specifications as a way for autonomous systems to "explain themselves" to humans. But a new study finds that we aren't understanding.  ( 9 min )
  • Open

    How Veriff decreased deployment time by 80% using Amazon SageMaker multi-model endpoints
    Veriff is an identity verification platform partner for innovative growth-driven organizations, including pioneers in financial services, FinTech, crypto, gaming, mobility, and online marketplaces. In this post, we show you how Veriff standardized their model deployment workflow using Amazon SageMaker, reducing costs and development time.  ( 8 min )
  • Open

    Improving traffic evacuations: A case study
    Posted by Damien Pierce, Software Engineer, and John Anderson, Senior Research Director, Google Research Some cities or communities develop an evacuation plan to be used in case of an emergency. There are a number of reasons why city officials might enact their plan, a primary one being a natural disaster, such as a tornado, flood, or wildfire. An evacuation plan can help the community more effectively respond to an emergency, and so could help save lives. However, it can be difficult for a city to evaluate such a plan because it is not practical to have an entire town or city rehearse a full blown evacuation. For example, Mill Valley, a city in northern California, created a wildfire evacuation plan but lacked an estimate for how long the evacuation would take. Today we describe a c…  ( 94 min )
  • Open

    DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models
    How trustworthy are generative pre-trained transformer (GPT) models? To answer this question, University of Illinois Urbana-Champaign, together with Stanford University, University of California, Berkeley, Center for AI Safety, and Microsoft Research, released a comprehensive trustworthiness evaluation platform for large language models (LLMs), which is presented in the recent paper: DecodingTrust: A Comprehensive Assessment of Trustworthiness […] The post DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models appeared first on Microsoft Research.  ( 11 min )
  • Open

    Explainable Artificial Intelligence (XAI) for AI & ML Engineers
    Introduction Hello AI&ML Engineers, as you all know, Artificial Intelligence (AI) and Machine Learning Engineering are the fastest growing fields, and almost all industries are adopting them to enhance and expedite their business decisions and needs; for the same, they are working on various aspects and preparing the data for the AIML platform with the help of SMEs… Read More »Explainable Artificial Intelligence (XAI) for AI & ML Engineers The post Explainable Artificial Intelligence (XAI) for AI & ML Engineers appeared first on Data Science Central.  ( 23 min )
  • Open

    Nearest, easiest, and most accessible
    From Love What Lasts, Joshua Gibbs: … there are too many things in the world to care equally about them all. The sheer volume of things … demands that we have hierarchical standards by which to judge their value, or else we are condemned to give our lives over entirely to what is nearest, easiest, […] Nearest, easiest, and most accessible first appeared on John D. Cook.  ( 4 min )
  • Open

    Benchmarking Bit Errors in Quantized Neural Networks with PyTorch
    Similar to my article series on adversarial robustness, I was planning to have a series on bit errors robustness accompanied by PyTorch code. Instead, due to time constraints, I decided to condense the information into a single article. The code for the originally planned six articles is available on GitHub. The post Benchmarking Bit Errors in Quantized Neural Networks with PyTorch appeared first on David Stutz.  ( 6 min )
  • Open

    Rethinking the Role of PPO in RLHF
    Rethinking the Role of PPO in RLHF TL;DR: In RLHF, there’s tension between the reward learning phase, which uses human preference in the form of comparisons, and the RL fine-tuning phase, which optimizes a single, non-comparative reward. What if we performed RL in a comparative way? Figure 1: This diagram illustrates the difference between reinforcement learning from absolute feedback and relative feedback. By incorporating a new component - pairwise policy gradient, we can unify the reward modeling stage and RL stage, enabling direct updates based on pairwise responses. Large Language Models (LLMs) have powered increasingly capable virtual assistants, such as GPT-4, Claude-2, Bard and Bing Chat. These systems can respond to complex user queries, write code, and even produce poetry. T…  ( 6 min )

  • Open

    Johnson circle theorem
    Draw three circles of radius r that intersect at a single point. Then draw a triangle connecting the remaining three points of intersection. (Each pair of circles intersects in two points, one of which is the point where all three circles intersect, so there are three other intersection points.) Then the circumcircle of the triangle, […] Johnson circle theorem first appeared on John D. Cook.  ( 5 min )
  • Open

    NVIDIA Blackwell B100 GPUs To Feature SK Hynix HBM3e Memory, Launches In Q2 2024 Due To Rise In AI Demand
    submitted by /u/norcalnatv [link] [comments]
    Researchers propose GameGPT: A multi-agent approach to fully automated game development
    Game dev is super complex nowadays - games have huge codebases, massive teams, and dev cycles dragging on for years. Costs are insane too - budgets can hit $100M+ easily. In a new paper, researchers propose to reverse this trend with an AI framework called GameGPT that automates parts of the dev process using multiple AI agents. Each agent handles a different role (all are fine-tuned from relevant base models): One agent reviews the game design plan to catch errors Another turns tasks into code implementations Reviewer agents check the code and results A testing agent validates everything works as expected By breaking up the workflow, GameGPT can simplify things for the AI agents. They just focus on a narrow role versus having one jack-of-all-trades agent. The authors argue GameGPT can eliminate repetitive and rote elements of gamedev like testing. This would free up developers to focus on creative design challenges. However, the GameGPT paper does not include any concrete results or experiments demonstrating improved performance. There is no evidence presented that GameGPT reduces hallucinations, redundancy or development time. The authors mention empirical results support their claims that the architecture is more effective, but none are provided. I could not find any additional support material about this work, like a project website, that I could use to further check into this (maybe someone can share in the comments?). Right now GameGPT seems mostly conceptual. The ideas are interesting but hard to assess without quantitative results. TLDR: New GameGPT AI framework aims to automate tedious parts of game development using specialized agents. No concrete results were provided in the paper - someone will need to test this out and report back. Full summary here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]
    Speech Condenser: An Advanced On-Premise Pipeline Tool for Streamlining and Summarizing Dialogues from Videos
    submitted by /u/nez_har [link] [comments]
    Best way to produce consistent images?
    Hi! I'm trying to jazz up my design portfolio for applying for jobs, and I wanted to insert some cute illustrations on each project page. The projects deal with a variety of topics so I'll need pictures of many things, but want to keep the style quite consistent. What is the best AI tool right now to do this? I paid for Midjourney but I can't seem to understand how to get it to do this. For example I got this image from DALLE and love the style, the white background also helps make it look better on the portfolio. I'd want another image in the same style of two kids throwing a ball, but can't figure out how to do it. Alternatively if I could upload this image to an AI and say "in the same style, generate..." that would be great too. Thank you! submitted by /u/_Dip_ [link] [comments]
    Messi vs Ronaldo | Freestyle Rap Song | AI Rap Song | Tell your opinion on this video
    submitted by /u/Agitated-Spell3979 [link] [comments]
    Seeking Your Feedback on a new community around Open-Source AI Code Generation Models
    Currently, we are building a community that is specifically dedicated to Open-Source AI Code Generation Models. Our aim is to create a thriving ecosystem where developers, enthusiasts, and experts can come together to drive innovation, share insights, and promote a collaborative approach to AI code generation. I wanted to provide you with an overview of the key features we're integrating into this community: 1. Collaboration: A dedicated space where enthusiasts and experts alike can collaborate on projects, share their findings, and work on enhancing existing models. 2. Discussion: Whether through forums or chat platforms, we aim to foster discussions around the challenges, breakthroughs, and best practices in the realm of AI code generation. 3. Resource Sharing: Our community will feature a repository/platform for members to freely share and access open-source models, datasets, and other essential tools. With your experience and insight into the AI domain, we would greatly appreciate your feedback on the following:- - Do you believe such a community would be valuable to you personally or to the wider developer community? - Would you consider becoming a part of such a community? - You are already a part of such a community and this one might not be of much value to you? - Any other suggestions or feedback? Your candid feedback on this idea, its potential impact, and any suggestions you might have will be invaluable to us as we continue shaping this community's structure and offerings. submitted by /u/akanshtyagi [link] [comments]
    Biden eyes adding AI chip curbs to Chinese companies abroad
    The Biden administration is considering closing a loophole that gives Chinese companies access to American artificial intelligence (AI) chips through units located overseas. The United States previously restricted shipments of AI chips to China but left overseas subsidiaries of Chinese companies with unfettered access. The Biden administration is now looking for ways to close this loophole and prevent China from accessing top AI technology. However, it is challenging to plug every gap in export controls. Chinese firms are purchasing chips for use in data centers abroad, and it is difficult for the United States to police those transactions. The United States has been seeking to halt the rise of China's AI capability, which depends on its access to U.S. chips. Washington has been working to close other loopholes that allow AI chips into China, and the new rules expected this month will likely apply those same restrictions more broadly to all companies in the market. The U.S. government is also grappling with the issue of Chinese parties accessing U.S. cloud providers like Amazon Web Services. Overall, the Biden administration is facing challenges in cutting China off from top AI technology and closing all loopholes in export controls. Source : https://www.reuters.com/technology/biden-eyes-adding-ai-chip-curbs-chinese-companies-abroad-2023-10-13/ submitted by /u/NuseAI [link] [comments]
  • Open

    SOTA Facial Recognition [D]
    I want to sort folders of pictures of people that are similar to an input photo by similarity. I managed to use DeepFace but I'm wondering if anyone knows a better method? ​ submitted by /u/RedditAlreaddit [link] [comments]  ( 9 min )
    [R] Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency
    Paper: https://arxiv.org/abs/2309.17382 Project page: https://agentification.github.io/RAFA Code: https://github.com/agentification/RAFA_code Reason for future, act for now (RAFA) TL;DR: - The first autonomous LLM agent RAFA with provable regret guarantees and outstanding empirical performances. - SOTA results on Game of 24, ALFWorld, BlocksWorld, and Tic-Tac-Toe. submitted by /u/WolverineUnable5957 [link] [comments]  ( 9 min )
    [P] Machine Learning Algorithm from Scratch
    submitted by /u/shaongit [link] [comments]  ( 8 min )
    [D]Was any further work done on the paper "Large-Scale Study of Curiosity-Driven Learning" in recent years?
    So, a few weeks ago, I got interested in the exploration problem in Reinforcement Learning and came across this amazing paper. Just wanted to know if any of you came across any paper which explores this idea more or takes it forward. Thanks in advance. submitted by /u/Interesting-Weeb-699 [link] [comments]  ( 9 min )
    [R] tool to brainstorm novel ideas
    Hey folks, I developed a research tool https://idea-factory.ngrok.dev/ (Login: [temp@holistic-intelligence.net](mailto:temp@holistic-intelligence.net) Password: noidea) to identify novel research problems grounded in the scientific literature. Given an idea that intrigues you, the tool identifies the most relevant pieces of literature, creates a brief summary, and provides three possible extensions of your idea. I would be happy to get your feedback on the usefulness of them. Thank you in advance! submitted by /u/Ma7dy [link] [comments]  ( 9 min )
    [P] Oddly Satisfying Animation of Pixel Shuffle
    submitted by /u/Animated-AI [link] [comments]  ( 8 min )
    [D] Pipeline for data processing in time series forecasting?
    What is the correct pipeline for data processing when conducting time series forecasting? Should we begin with data normalization/standardization, followed by feature selection, and then split the data into training, validation, and test sets? Or is it advisable to initially split the data to prevent spill-over effects? I'm concerned about the possibility of training my model on (part of) the test data, which could result in spill-over effects. However, if the recommended approach is to split the data first and then perform normalization and feature selection, what impact would this have on the selected features? Does the manner in which we split the data into random time periods matter, or is it necessary to incorporate a validation method that accounts for temporal effects? I'm worried that the selected features might depend on the time period I choose for my training and test sets. What is the best practice in this scenario? submitted by /u/Ambitious-Pay6329 [link] [comments]  ( 9 min )
    [R] Researchers propose GameGPT: A multi-agent approach to fully automated game development
    Game dev is super complex nowadays - games have huge codebases, massive teams, and dev cycles dragging on for years. Costs are insane too - budgets can hit $100M+ easily. In a new paper, researchers propose to reverse this trend with an AI framework called GameGPT that automates parts of the dev process using multiple AI agents. Each agent handles a different role (all are fine-tuned from relevant base models): One agent reviews the game design plan to catch errors Another turns tasks into code implementations Reviewer agents check the code and results A testing agent validates everything works as expected By breaking up the workflow, GameGPT can simplify things for the AI agents. They just focus on a narrow role versus having one jack-of-all-trades agent. The authors argue GameGPT can eliminate repetitive and rote elements of gamedev like testing. This would free up developers to focus on creative design challenges. However, the GameGPT paper does not include any concrete results or experiments demonstrating improved performance. There is no evidence presented that GameGPT reduces hallucinations, redundancy or development time. The authors mention empirical results support their claims that the architecture is more effective, but none are provided. I could not find any additional support material about this work, like a project website, that I could use to further check into this (maybe someone can share in the comments?). Right now GameGPT seems mostly conceptual. The ideas are interesting but hard to assess without quantitative results. TLDR: New GameGPT AI framework aims to automate tedious parts of game development using specialized agents. No concrete results were provided in the paper - someone will need to test this out and report back. Full summary here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Generate audio samples based on promp sample
    hi, I would like to create a system that generate different audio samples, based on an audio sample prompt. Does anyone know whether such a project or similar ideas have been already implemented? Or any suggestion on what to read in order to realize such a project? I have knowledge in ML programming and python audio generation. submitted by /u/busconw [link] [comments]  ( 9 min )
    [D] Getting bad MFUs, what can I do to make it better
    Hi, so I've been working with NanoGPT, finetuning GPT-2, and I'm getting terrible MFUs, with 5 warmup steps at -100% and normal steps have an MFU of around 3-4%. Most runs I hear of have an MFU at around 45%? How do get this better? Colab -> https://colab.research.google.com/drive/1gvTsyjxHiDkKHFsnWWouzr1xJWW23BA3?usp=sharing Code -> https://github.com/VatsaDev/NanoPhi2 submitted by /u/vatsadev [link] [comments]  ( 9 min )
    [D] Check out my latest article on how the new improvements in GPT-4V(ision) can bring on a new ear of computer vision models, fine-tuned on outputs of GPT-4V(vision).
    https://medium.com/@rishiswethan.c.r/how-gpt-4v-ision-will-revolutionise-image-annotation-b0d3ace64bff?source=friends_link&sk=4be42541a8a8ee40e18ef14533342cfd submitted by /u/Remarkable_Seesaw_89 [link] [comments]  ( 8 min )
    How to object detection in Unity any good resources [D]
    I have tired barracuda, vuforia and it doesn’t work for some reason. And completely lost atm. It’s an object detection model to detect the circuit schematic symbols using computer vision submitted by /u/PreferenceFrosty2958 [link] [comments]  ( 9 min )
    [D] Running Large Language Models on CPU
    Fine-tuning large language models with the aim of obtaining a small but accurate model is extremely difficult. This is because you have to strike a balance between the model’s size and accuracy. Researchers from IST Austria & Neural Magic seem to have found a sweet spot. In their latest paper, they successfully applied sparse fine-tuning on MPT with remarkable performance. The MPT model was pruned to 75% without a drop in accuracy, showing performance that is on-par with quantization approaches. Particularly, the resulting sparse model can execute fast on CPUs by taking advantage of the sparsity. Instead of performing standard loss-based fine-tuning which may fail to recover accuracy, the researchers experiment with distillation-type losses. These losses are better at recovering accuracy at high sparsity. What’s impressive is that the sparse fine-tune LLM can achieve 7.7 tokens per second on a single core and 26.7 tokens per second on 4 cores of an cheap consumer AMD Ryzen CPU. The MPT-7B model was fine-tuned via SFT obtaining a dense baseline that showed remarkable performance. This baseline was later pruned with SparseGPT to 40% to 80% reaching 5X compression ratios. By applying SquareHead KD, FP32 models with 75% can be obtained with NO accuracy loss, outperforming cross-entropy and other KD methods. The paper is available on Arxiv. Sparse Finetuning for Inference Acceleration of Large Language Models: https://huggingface.co/papers/2310.06927 MPT Sparse Finetuned on GSM8k with DeepSparse Hugging Face Space: https://huggingface.co/spaces/neuralmagic/sparse-mpt-7b-gsm8k submitted by /u/mwitiderrick [link] [comments]  ( 9 min )
    [P] I built an AI Writing Coach to proofread your work
    submitted by /u/hungryillini [link] [comments]  ( 8 min )
    [R] Conceptual Framework for Autonomous Cognitive Entities - Clemson University 2023 - Introducing the ACE Framework
    Paper: https://arxiv.org/abs/2310.06775 GitHub: https://github.com/daveshap/ACE_Framework Blog post: https://medium.com/@dave-shap/autonomous-agents-are-here-introducing-the-ace-framework-a180af15d57c Abstract: The rapid development and adoption of Generative AI (GAI) technology in the form of chatbots such as ChatGPT and Claude has greatly increased interest in agentic machines. This paper introduces the Autonomous Cognitive Entity (ACE) model, a novel framework for a cognitive architecture, enabling machines and software agents to operate more independently. Drawing inspiration from the OSI model, the ACE framework presents layers of abstraction to conceptualize artificial cognitive architectures. The model is designed to harness the capabilities of the latest generative AI technologies, including large language models (LLMs) and multimodal generative models (MMMs), to build autonomous, agentic systems. The ACE framework comprises six layers: the Aspirational Layer, Global Strategy, Agent Model, Executive Function, Cognitive Control, and Task Prosecution. Each layer plays a distinct role, ranging from setting the moral compass and strategic thinking to task selection and execution. The ACE framework also incorporates mechanisms for handling failures and adapting actions, thereby enhancing the robustness and flexibility of autonomous agents. This paper introduces the conceptual framework and proposes implementation strategies that have been tested and observed in industry. The goal of this paper is to formalize this framework so as to be more accessible. ​ https://preview.redd.it/7scnwk5a5dub1.png?width=850&format=png&auto=webp&s=371b5b02a453dcad3e70a2600cc2d625eda44133 ​ submitted by /u/Prior-Travel3670 [link] [comments]  ( 9 min )
    [D] Fine tune Llama2 with Lora for foreign language
    Hey folks, I watched a YouTube video, about how some LLMs tokenise languages other than English. For example for the Greek language you will see that this is failing totally, as one character is one token always: ​ https://preview.redd.it/835p97cyhcub1.png?width=1900&format=png&auto=webp&s=944b150cc0fc112cb8cd2bac600f6fcdcc85fb1e My question is, if I would fine-tune it with Alpaca Lora based on Greek text, would the tokeniser change and work properly? Or the fine tune would not work as the tokeniser cannot be retrained/tuned? submitted by /u/kostakos14 [link] [comments]  ( 9 min )
    [D] Advice for applying to undergraduate research internships?
    Hello, I’m a 3rd year data science and linguistics major at a top 30 school looking to land an internship at industry research. I’d say I’m fairly competitive. Extensive research experience. 2nd author at EMNLP, and did an REU at a prestigious institute. I’m already looking at some places such as AI2, but I’m curious if there are other internships I should be aware of. submitted by /u/Kai_151 [link] [comments]  ( 9 min )
    [D] The history of neural network is over. J. Schimdhuber proposes a giant network that includes all future neural network architecture as a subcomponent.
    submitted by /u/fromnighttilldawn [link] [comments]  ( 8 min )
    [P] How do I make my CNN more efficient?
    I've been trying a variety of pre-constructed and self-made U-net-like CNNs. Had a few questions: When using torch summary, is there a general formula for estimating a model's inference time/backprop time and required GPU ram based on the information torch summary gives ( Total params, Trainable params, Non-trainable params, Total mult-adds (G), Input size, Forward/backward pass size (MB), Params size (MB), Estimated Total Size (MB)), and other hyper-parameters such as batch size? Why is my self-made model (which has smaller quantities in all the parameters torch summary outputs) requiring more GPU ram AND taking more time for inference and backprop? Is the coding style for the model's class and its forward prop a huge factor here? If so, could you please provide tips for making my code more efficient? Here's the notebook showcasing a pre-made model from MONAI and two of my self-made models: https://colab.research.google.com/drive/1VRrdnzaAbp25_DtaWTKHW5JxzyhmueMC?usp=sharing I've also listed some of my observation on the models and their results in the notebook. Any ideas or suggestion would be much appreciated. submitted by /u/mimivirus2 [link] [comments]  ( 9 min )
    [P] Made a Python package for creating API endpoints with dynamic queries.
    submitted by /u/squirrels-api [link] [comments]  ( 9 min )
    [R] Supercharging reinforcement learning with logic
    Deep reinforcement learning has led to a variety of compelling results. However, performance issues, particularly relating to the data efficiency of simulation has limited it applicability in domains where simulations run more slowly. Our solution is to use a logic base framework, PyReason, as a proxy for the simulation. ​ https://preview.redd.it/kdhpu9qraaub1.png?width=1786&format=png&auto=webp&s=8155ba38fc66bd3a2fe934b1f395351c4db68e2f We showed that inference with PyReason logic program can provide up to a three order-of-magnitude speedup when compared with native simulations (we studied AFSIM and Starcraft2) while providing comparable reward and win rate (we found that PyReason-trained agents actually performed better than expected in both AFSIM and Starcraft2). ​ https://preview.…  ( 9 min )
    [D] transformers vs llama.cpp vs GPTQ vs GGML vs GGUF
    i am a little puzzled, i know that transformers is the HF framework/library to load infere and train models easily and that llama.cpp is another framework/library that does the more of the same but specialized in models that runs on CPU and quanitized and run much faster i understand that GGML is a file format for saving model parameters in a single file, that its an old problematic format, and GGUF is the new kid on the block, and GPTQ is the same quanitized file format for models that runs on GPU ​ so here is what i can't understand (assuming i got all the rest correct): does HF Transformers support loading GGUF or GGML models ? and does GGUF needs a tokenizer json or does the data comes from within the gguf file itself and is safetensors (another file format) supported by both Transformers and Llama.cpp ​ since i cannot find python examples for these combination i assume all the answers are - No ​ can anyone shed some light ? submitted by /u/Particular_Flower_12 [link] [comments]  ( 9 min )
  • Open

    Hi everyone , I was following an online RL tutorial that uses Stable baselines3 and Open AI's gym to implement a Cart Pole environment but I have ran into some problems. Can anyone of you please help me?
    I was following Nicholas Renotte's RL in 3 hours tutorial and I ran into this issue at time stamp 1:10:00 while testing my trained Agent. ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part. This is my code for testing my environment: episodes=5 for episode in range(1,episodes+1): obs=env.reset() done=False score=0 print(obs) while not done: env.render() action, _ = model.predict(obs) #Now using model here obs, reward, done, truncated, info = env.step(action) score += reward print('Episode:{} Score:{}'.format(episode,score)) env.close() And this is the environment I am using : environment_name = 'CartPole-v1' env=gym.make(environment_name,render_mode="human") the model variable has my trained model stored in it and is initialized as such : model =PPO.load(PPO_Path, env=env) The print(obs) function returns this value : (array([ 0.03954345, -0.04975226, -0.02942382, -0.02261402], dtype=float32), {}) I am running this code in a Notebook on VS code on an M2 Macbook running MacOS 13.5, I am using Python 3.9.15 and the latest version of all the other libraries and dependencies. Please help submitted by /u/Straight-Knowledge83 [link] [comments]  ( 9 min )
    Reinforcement Learning Platform for UAVs
    I'm doing a project that aims to use reinforcement learning (PPO variations) with UAVs. What are the most up to date tools are for implementing and trying new RL algorithms in this space? I've looked at AirSim, and it seems to no longer be supported by Micrsosoft. I've also been heavily looking at Flightmare, which is almost exactly what I want, but getting the tool that hasn't been maintained for years up and running is giving me headaches (and the documentation is not great/up to date either). Ultimately, what I'm looking for is: * Physics simulation * Photo-realistic vision * Built-in integration with Gym would be awesome * Python platform preferred, C++ also ok I've also used ROS/Gazebo with PyTorch previously, and that is my backup plan I suppose, but it's not photo-realistic and is kind of slow in my experience. submitted by /u/zeus_the_transistor [link] [comments]  ( 9 min )
    Training a RL Model with Continuous State & Action Space in a Real-World Scenario
    Hello everyone, I'm a Data Science student diving into an exciting thesis topic: using reinforcement learning to stabilize boats in rough seas by adjusting a keel's angle. But I am a bit concerned about the high complexity of the problem and the given situation: Action Space: Continuous, representing the keel's angle adjustments. State Space: Continuous, capturing the dynamic behavior of the sea, including waves. Training Environment: Currently, the company only has a real-world water tank setup to simulate the sea conditions. There's no computer simulation available. Given this setup, I have a couple of concerns: Is it possible to train an RL model effectively in such a complex real-world scenario without first having a computer simulation? And if yes, what would be your initial steps in doing so? Are there possibilities to reduce the problem's complexity while training exclusively in the real-world water tank simulation? (i.e. transforming the action space into a discrete action space?) Any insights or advice would be greatly appreciated! submitted by /u/No-Wasabi3556 [link] [comments]  ( 9 min )
    Supercharging reinforcement learning with logic
    Deep reinforcement learning has led to a variety of compelling results. However, performance issues, particularly relating to the data efficiency of simulation has limited it applicability in domains where simulations run more slowly. Our solution is to use a logic base framework, PyReason, as a proxy for the simulation. ​ https://preview.redd.it/6wmg0qnlaaub1.png?width=1786&format=png&auto=webp&s=01f82cf24de79b317b6f9406b0b6379b949a34d3 We showed that inference with PyReason logic program can provide up to a three order-of-magnitude speedup when compared with native simulations (we studied AFSIM and Starcraft2) while providing comparable reward and win rate (we found that PyReason-trained agents actually performed better than expected in both AFSIM and Starcraft2). ​ https://preview.redd.it/u8f44fskaaub1.png?width=1636&format=png&auto=webp&s=9509f03a936f41cd0131388564833b86a39c295a However, the benefits of our semantic proxy go well beyond performance. The use of temporal logic programming has two crucial beneficial by-products such as symbolic explainability and modularity. PyReason provides an explainable symbolic trace that captures the evolution of the environment in a precise manner while modularity allows us to add or remove aspects of the logic program – allowing for adjustments to the simulation based on a library of behaviors. PyReason is well-suited to model simulated environments for other reasons – namely the ability to directly capture non-Markovian relationships and the open-world nature (defaults are “uncertain” instead of true or false). We have demonstrated that agents can be trained using standard RL techniques such as DQN using this framework. Preprint: https://arxiv.org/abs/2310.06835 Video: https://youtu.be/9e6ZHJEJzgw Code for PyReason-as-a-Sim (integration with DQN): https://github.com/lab-v2/pyreason-rl-sim Code for PyReason Gym: https://github.com/lab-v2/pyreason-gym PyReason Home: neurosymbolic.asu.edu/pyreason/ ​ submitted by /u/Neurosymbolic [link] [comments]  ( 9 min )
    Actor-critic on piecewise constant reward function
    I made a environment with piece wise constant reward function for testing the network architecture. And its episode length is 1. The critic will try to learn this and become a piecewise constant function. And have a gradient close to 0 making the gradient vanish for the policy. I can think of some solutions: - Change the reward function to a dense reward But i wanted some other views; has anyone solved such problems? submitted by /u/Automatic-Web8429 [link] [comments]
    Help understanding the PETS algorithm
    I am trying to read this paper and I am unable to get the big picture over here. Can someone please explain what's going on in the Propagation and Planning stage? In the Model stage, I understand that they are using a Probabilistic Model to handle uncertainty. ​ https://preview.redd.it/idenqd492aub1.png?width=945&format=png&auto=webp&s=40da9bf53b21dbed63b70571f3833b0fe3a9dabb For instance, what does Particle mean in this paper? This big picture here is that I am trying to understand the Model Based Policy Optimization paper and it seemed like they built upon the above paper. submitted by /u/Academic-Rent7800 [link] [comments]
  • Open

    Supercharging reinforcement learning with logic
    Deep reinforcement learning has led to a variety of compelling results. However, performance issues, particularly relating to the data efficiency of simulation has limited it applicability in domains where simulations run more slowly. Our solution is to use a logic base framework, PyReason, as a proxy for the simulation. ​ https://preview.redd.it/pmukb2k7aaub1.png?width=1786&format=png&auto=webp&s=3fb36d0fbeb75393ae8f71f8f369ff5e0b79fbcb We showed that inference with PyReason logic program can provide up to a three order-of-magnitude speedup when compared with native simulations (we studied AFSIM and Starcraft2) while providing comparable reward and win rate (we found that PyReason-trained agents actually performed better than expected in both AFSIM and Starcraft2). ​ https://preview.…

  • Open

    [D] Detect anomaly with small dataset
    Hi guys, I'm hoping for advice on the direction to detect detect pattern/ anomaly at small scale. I understand there are certain tools out there for webpage monitoring, but let's say this is just an example that I'm ingesting small amount of hourly/daily traffic to a sub webpage on my site (anywhere from 50-100 visits per day, this may mean max ~30 visits/per hour) There are times when traffic to the page drops as the page doesn't fully load , or the other page on which I'm hosting the link to this page doesn't load resulting in people can't see the link tothis sub page). Giving the scope/scale of this, amount of the data, it's not possible for me to use other solutions for anomaly detection (those that costs like $100-$1000+/month) and I'm not sure where to start with ML with this minimal amount of hourly/daily data to monitor. Is there anything that I should look into? Thank you submitted by /u/duyth [link] [comments]  ( 9 min )
    [D] Foundational must reads for LLMs
    Came across this post https://community.openai.com/t/foundational-must-read-gpt-llm-papers/197003 As I am new to LLM's , Please share your thoughts on how to start and what subtopics to learn in depth ? ​ submitted by /u/Electrical_Study_617 [link] [comments]  ( 8 min )
    [D] Google AutoML Alternatives?
    Having jumped into AI this last year I've used Google AutoML a lot and it's honestly worked great. I primarily use it for text classification. Training usually takes anywhere from 4-8 hours. The results have been above 90% accurate on interference. Now, the problem. Cost. It's super expensive to run an endpoint for predictions with Google AutoML, for text classification. I'm wondering if anyone has any alternatives or ideas for similar results for cheaper. I am ok waiting for prediction results a bit as I don't need sub 1ms type responses lol. But everything I've tried has yielded less then optimal results. Tried various hugging face models, and accuracy is about 50%. submitted by /u/zepaz [link] [comments]  ( 9 min )
    [R] Octopus: Embodied Vision-Language Programmer from Environmental Feedback - Nanyang Technological University 2023 - Continually refines its understanding and execution, demonstrating impressive adaptability!
    Paper: https://arxiv.org/abs/2310.08588 Blog: https://choiszt.github.io/Octopus/ Github: https://github.com/dongyh20/Octopus Youtube short: https://www.youtube.com/watch?v=lHbTvB0yIP4 Abstract: Large vision-language models (VLMs) have achieved substantial progress in multimodal perception and reasoning. Furthermore, when seamlessly integrated into an embodied agent, it signifies a crucial stride towards the creation of autonomous and context-aware systems capable of formulating plans and executing commands with precision. In this paper, we introduce Octopus, a novel VLM designed to proficiently decipher an agent's vision and textual task objectives and to formulate intricate action sequences and generate executable code. Our design allows the agent to adeptly handle a wide spectrum of tasks, ranging from mundane daily chores in simulators to sophisticated interactions in complex video games. Octopus is trained by leveraging GPT-4 to control an explorative agent to generate training data, i.e., action blueprints and the corresponding executable code, within our experimental environment called OctoVerse. We also collect the feedback that allows the enhanced training scheme of Reinforcement Learning with Environmental Feedback (RLEF). Through a series of experiments, we illuminate Octopus's functionality and present compelling results, and the proposed RLEF turns out to refine the agent's decision-making. By open-sourcing our model architecture, simulator, and dataset, we aspire to ignite further innovation and foster collaborative applications within the broader embodied AI community. https://preview.redd.it/1zn9q3g7a8ub1.jpg?width=1651&format=pjpg&auto=webp&s=3b14f862b24784918d6b4514bf575cf29bc65edf https://preview.redd.it/sv2y06g7a8ub1.jpg?width=1079&format=pjpg&auto=webp&s=be9ab7dd7cf23018b6d1fa0c584ad301b04c8abf https://preview.redd.it/350xc6g7a8ub1.jpg?width=942&format=pjpg&auto=webp&s=53e57541d35ca23d06b8c5be71c2b0c1910fdf90 ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [R] My article about autonomous LLMs-based agents: Chain of Thought, Plan and Solve, Self-Ask, ReAct, Reflexion, Self-Consistency, ToT, and GoT; and intrinsic insights behind an autonomous LLMs-based agents.
    A Complete Guide to LLMs-based Autonomous Agents (Part I): https://medium.com/p/69515c016792 My article offers a comprehensive overview of LLM-based agents, covering Chain of Thought, Plan and Solve/Execute, Self-Ask, ReAct, Reflexion, Self-Consistency, Tree of Thoughts, and Graph of Thoughts. It traces their evolution from basic forms, driven primarily by prompt engineering, to advanced models that emulate human problem-solving intricacies. Moreover, it provides an engineer's insights into the architecture behind these autonomous agents. Naturally suitable for AI agent: LLMs feature a natural language interface tailored for user-computer interactions and they come equipped with innate reasoning abilities. LLM's Deficiency: Despite its strengths, GPT-4 can provide incorrect answers or hallucinations for complex tasks. Challenges with Training: Finetuning pretrained LLMs doesn't enhance reasoning capabilities. While creating a larger LLM can bolster its problem-solving skills, the process can span several months to a year, potentially leading to a two-year wait before its official launch. Closed Model and RAG: LLMs, once trained, are unable to fetch real-time data and have inherent shortcomings. However, for Q&A tasks, leveraging an open-book method proves more effective. The aim is not to have an all-knowing model but one skilled in reasoning and utilizing tools. LLM Agent Approach: We direct LLMs to break down intricate tasks, tackle individual sub-tasks, evaluate them, and make revisions of the strategy as needed. submitted by /u/Appropriate-Map-9923 [link] [comments]  ( 9 min )
    [D] Have a research paper to do for my masters in Big Data Analytics. Wanted to do something with ML. Just look for some advice.
    In my last semester and we have to pick a topic related to big data analytics. Right now I have to prepare a proposal for my topic. My topic will have to do with something to ML and the medicinal field. Current plan: Get a dataset related to my topic. Right now its Parkinson's disease. My question is, for the dataset would I need a dataset with text data or would images of scans of the brain be better for detecting say early detection be better? I cant figure out which would be the better dataset. Get the dataset and then use Azure machine learning to prepare my dataset and do some data cleaning and handling and then get a model out of it. I picked azure because I have azure license from my uni and after searching about, I read about the azure machine learning service. Would azure be a good choice for training my model on this task? I've mostly used google colab for training small models. Once the model is trained and setup. I want to setup a front end web app (flask) and then setup my model so that users can upload either text data or image scans and then model would output results regarding the inputted data. My question is, would it be ideal to have the model located on my local machine or would azure let me do api calls between my local to the azure trained model? Would all this be feasible to do? I'm not looking to develop a full fledge application, just want to create a model with a dataset of images or text and then be able to feed new images to the trained model and get an output. Just looking for opinions or advice on this topic. Thanks. submitted by /u/Jesustakethewheeeeel [link] [comments]  ( 9 min )
    [D] Is the topic of your ML PhD important?
    I read the previous discussion on whether a PhD is required in the field, and I had a follow-up question: does the topic of your PhD matter? So let’s say you finish a PhD in the field of medical machine learning (non-CV), would an automotive company, FAANG, or e.g. DeepMind still like to hire you once you would like to switch your sub-field a bit? Or are you simply less desirable than a candidate without a PhD but more experience in CV? I am asking this because I would like to stay flexible as I have many ML sub-fields I want to work in, and I do not want to limit my options by pursuing a PhD in a topic that I don’t want work in for my entire life. For context, I do already have 2 years working experience as an AI engineer and I am finishing my AI master’s. submitted by /u/Otoz123 [link] [comments]  ( 9 min )
    [D] Fine-Tuning tortoise tts
    I'm planning on creating my own AI voice to use with ChatGPT. I have done my research, and there are two ways to achieve a quality TTS model to use. I have tried them both. I fine-tuned tortoise tts on my own 20-minute dataset. I have also tried to create a model using Tacotron2 and the dataset. The quality of the fine-tuned model is better. But one downside is that I still have to give the fine-tuned tortoise model a reference voice for it to choose the voice that I fine-tuned it with. On the other hand, the trained model didn't need to. The question here is: why didn't the tortoise model choose the voice in the dataset as the default? Do I need to expand my dataset for it to be chosen as the main voice? ​ Thank all. submitted by /u/Capital_Birthday_654 [link] [comments]  ( 9 min )
    [R] Do pretrained Transformers Really Learn In-context by Gradient Descent?
    Do pretrained Transformers Really Learn In-context by Gradient Descent? https://x.com/Shadowkiller331/status/1713003711629516862?s=20 ​ https://preview.redd.it/zpwkh47hm7ub1.png?width=450&format=png&auto=webp&s=6def807c9c9f605e3f7839159db3402d837f6895 submitted by /u/Educational-Newt2052 [link] [comments]  ( 8 min )
    [R] Unlocking the power of Sparsity in Generative Models: 8x Faster LLMs on CPUs with Sparse Fine Tuning
    submitted by /u/markurtz [link] [comments]  ( 8 min )
    A[R]xiv [D]ives - Llama 2 Deep Dive
    We’ve been diving deep into foundational papers on Fridays as a group. It’s been helpful for us to get into the nitty gritty details of these papers, so hope you find it helpful too. Would love to have anyone join the discussion next week! submitted by /u/FallMindless3563 [link] [comments]  ( 9 min )
    [N] Most detailed human brain map ever contains 3,300 cell types
    What can this mean to artificial neural networks? submitted by /u/hhh888hhhh [link] [comments]  ( 8 min )
    [D] My fine tune behaves like the base model
    Hi all, I did a fine tune of CodeLlama-7b on a custom dataset and I was getting very excited because it was doing very well on evals. I saved the model with model.save() and model.push_to_hub() and it seemed to work. When I load the model it shows the structure with the Lora_A and Lora_B for every layer, but it now acts like the base model with no changes. Is it possible I saved wrong or likely that I am loading wrong? Any help is greatly appreciated! submitted by /u/cstein123 [link] [comments]  ( 9 min )
    [R] Machine Learning Courses Mega Bundle from Mammoth Interactive
    submitted by /u/brand_momentum [link] [comments]  ( 8 min )
    [R] best RL algorithm for a single turn game ?
    Hi there, I'm new to Reinforcement Learning (RL), and the papers I've come across mainly focus on scenarios where states change with choices in a game. However, I'm interested in finding the best RL algorithm for a simpler case. I have an input I and a policy P. P outputs probabilities for available choices (a limited set of integers), and a reward r is given for each choice (the reward is costly to compute that’s why I use RL). The goal is to train P to maximize the reward. So as if we are in a game that ends after only one choice. Any recommendations for the best RL algorithm in this case? Thanks! submitted by /u/Meddhouib10 [link] [comments]  ( 9 min )
    [D] Ways to get research experience before grad school
    I recently graduated with my bachelor's from a low ranked school with a good GPA. I was planning on starting a PhD studying NLP and applied to 12 mid level schools. However, I was unfortunately rejected from all the schools I applied to. I suspect it was likely due to my lack of experience in NLP research as my school didn't have any professors who do research in that area. My current plan is to work in industry for the next two years and try and do some NLP research on the side before reapplying. Do any NLP labs allow for external volunteer researchers? Besides that, are there any other ways to get research experience? submitted by /u/Bananas970 [link] [comments]  ( 9 min )
    [D] Validation loss is decreasing but WER is increasing in Whisper model training.
    Hi, I've been using the Huggingface library to fine-tune the Whisper model. While the WER was initially decreasing, I've noticed it began to rise even though the validation loss continues to drop. Could the issue be related to my testing on a very small dataset? As shown in the image, after 80th step the wer suddenly started increasing from 13 -> 28 https://preview.redd.it/xq2bm0oyh5ub1.png?width=838&format=png&auto=webp&s=136447f527bea6880b46ae588463500304b1d6bb ​ submitted by /u/aadityaura [link] [comments]  ( 9 min )
    Looking for An Easy-To-Use API To Train Image Model [D]
    Yo! I have some images I curated on MJ, I want to run these together into an AI and spit out more outputs like these. The current process has me get maybe .2% successful outputs through MJ I figure the next step to more outputs is training a custom model. What's the easiest way to do this using a web-based API? Does this involve using Stable Diffusion? submitted by /u/AdministrativePie991 [link] [comments]  ( 9 min )
    [D] Time Series Forecasting on positive AND negative Examples
    Hey 😀 not sure if extremely trivial or really tricky. In the end, I want a machine that generates a time series without further input based on training data, generating a new time series every time. I want this to be based on a transformer. I want it trained with data looking like this: 2023-07-03 14:19:48,GOOD 2023-07-04 13:59:07,GOOD 2023-07-05 01:58:54,GOOD 2023-07-05 03:30:05,BAD 2023-07-05 05:17:43,BAD 2023-07-06 05:35:34,GOOD 2023-07-07 14:06:03,GOOD 2023-07-08 21:16:05,BAD with “GOOD” and “BAD” being the state of the system which is likely dependent on the time series data up to that point. I have a lot of data and it’s data points like the one above with maybe a hundred rows of data on average for a few thousand systems. Every system is independent of all others but all are identical. I do not want to train only on “GOOD” as this would leave out a lot of valuable data … Is there a way to train a time series transformer with both data that leads to GOOD as well as BAD outcomes, so it would generate time series from scratch that are unlikely to have BAD outcomes? Thank you!! submitted by /u/_VeniVidiVeni_ [link] [comments]  ( 9 min )
    [P] VGSLify: Transform Your TensorFlow Model Prototyping Experience
    Hey r/MachineLearning! 🚀 Have you ever been frustrated with the lengthy and sometimes cumbersome TensorFlow code for defining models? Or wished you could experiment with different architectures without dealing with copious lines of code? That's where VGSLify steps in. Why Use VGSLify? Compact Definitions: VGSLify leverages VGSL spec, enabling you to express intricate model architectures in a compact and elegant manner. This means you can quickly experiment with different models by simply tweaking a string format, bypassing the verbose code traditionally required. Swift Prototyping: Craft intricate neural network architectures using succinct VGSL spec strings, allowing you to iterate faster and more efficiently. From TensorFlow to VGSL: Got a pre-existing TensorFlow model? Easily co…  ( 9 min )
    [D] How important is a PhD for industry?
    I'm 21 years old and currently pursuing a master's degree in theoretical physics in the UK. I have a strong interest in machine learning and have completed many computing courses as well as independent projects in this field. I'm considering a career in machine learning and I'm curious about the benefits of doing a PhD. I've heard that the salary difference may not be substantial. Could anyone provide insights on how important a PhD is for specific roles in this field? Additionally, what factors should I consider when deciding whether to pursue a PhD in machine learning, apart from my passion for ML? Also are private PhDs common in ML. Working in a company and asked them to pursue a PhD within the company? Thanks :) submitted by /u/Neat-Print2792 [link] [comments]  ( 9 min )
    [D] SHAP mask_token: why does it matter and which one to choose?
    submitted by /u/Being-Nothingness [link] [comments]  ( 9 min )
    [R] Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
    submitted by /u/hardmaru [link] [comments]  ( 8 min )
    [D] is there a good Code or Text model ?
    i am trying to detect code segments in a text response of an LLM, so i can highlight them using Highlight,JS, ​ is there a good model that can do the classification of a block of text and decide if it is a block of code or a block of NLP simple text (english) ? submitted by /u/Particular_Flower_12 [link] [comments]  ( 9 min )
  • Open

    Looking for developers / future founders who want to build and grow disruptive AI apps.
    I am a multi-time founder myself. I've secured millions from investors for my past startups and had notable success with a video app that gathered 4M users and $300k in revenue. However, due to the intense competition in the video app editing sector, my team and I couldn't turn a profit. After my last startup faltered during the covid period, I transitioned to being a full-time product-market fit and growth marketing consultant and have made really great money doing it. I assist new startups in avoiding the mistakes I made and implement frameworks that significantly increase their chances of success. I've observed that many new founders venture into startups without fully grasping the challenges of building something people genuinely desire. It’s really not easy. How would you know what y…
    AI Images Detectors Are Being Used to Discredit the Real Horrors of War
    A free AI image detector is being used to discredit a photograph of a burnt corpse of a baby killed in Hamas's attack on Israel. However, experts have pointed out that the image does not show any signs of being created by AI. The idea that the image is AI-generated has spread on Twitter, suggesting that official Israeli accounts are spreading AI-generated misinformation. AI image generators have trouble replicating reality accurately, and the shadows in the photograph are consistent with a real image. Multiple AI image detection tools have also determined that the image is not AI-generated. Source : https://www.404media.co/ai-images-detectors-are-being-used-to-discredit-the-real-horrors-of-war/ submitted by /u/NuseAI [link] [comments]
    Seeking a Community for Open-Source AI Code Generation Models
    Hello everyone! 🌟 I hope this post finds you well. I've been delving deeper into the world of AI code generation recently and am curious to discover if there are communities or platforms specifically dedicated to open-source AI code generation models. I'm aware of hugging face but is there any other besides that. Here's what I'm looking for: Collaboration: A space where enthusiasts and experts alike can collaborate on projects, share insights, and improve upon existing models. Discussion: Forums or chat platforms where discussions around the challenges, breakthroughs, and best practices in AI code generation take place. Resource Sharing: A repository or platform where open-source models, datasets, and related tools can be freely shared and accessed. Learning and Tutorials: Any resources that can help newcomers grasp the concepts and intricacies of AI code generation. If you know of any such community or are part of one, please do let me know. submitted by /u/akanshtyagi [link] [comments]
    Mickey, what are you doing?
    submitted by /u/LeviJr00 [link] [comments]
    Updates to my Capstone Project with Enhanced Features and still freely available to all (until OpenAI credits deplete - Free ChatGPT4). Hoping to introduce the community feature too where people can generate STEM animations to aid learning
    submitted by /u/Raymondlkj [link] [comments]
    learning for school with an AI
    does anyone know if there is an AI online where you can import documents and the AI is forming and asking you questions about that topic on the document? like an AI who generates test for you to be prepared for every potential question in a school test. submitted by /u/satanskittenz [link] [comments]
    Creative Question: Your ideas for AI generative reality
    Ok so we have AI generated content, First text, then images, then videos. What will the world look like when we have a generative world? Generative objects, Generative Games, Generative Moods, Generative memories, Generative senses and perceptions, Generative Environments, Generative Reality. Anyone want to talk about what it might look like? ( I would like to hear a unhinged idea for what might happen, Speculative of course ) submitted by /u/rolyataylor2 [link] [comments]
  • Open

    AI’s Kryptonite: Data Quality
    The ability of Generative AI (GenAI) tools to deliver accurate and reliable outputs entirely depends on the accuracy and reliability of the data used to train the Large Language Models (LLMs) that power the GenAI tool. Unfortunately, the Law of GIGO – Garbage In, Garbage Out – threatens the widespread adoption of GenAI.  Whether generating… Read More »AI’s Kryptonite: Data Quality The post AI’s Kryptonite: Data Quality appeared first on Data Science Central.  ( 22 min )
  • Open

    "Pitfalls of learning a reward function online", Armstrong et al 2020 {DM}
    submitted by /u/gwern [link] [comments]
  • Open

    Newton line
    Let Q be a convex quadrilateral with at most two parallel sides. Draw the two diagonals then draw a line through their midpoints. This line is called the Newton line. (The requirement that at most two sides are parallel insures that the midpoints are distinct and so there is a unique line joining them.) In […] Newton line first appeared on John D. Cook.  ( 5 min )
  • Open

    FABind: Fast and Accurate Protein-Ligand Binding. (arXiv:2310.06763v2 [cs.LG] UPDATED)
    Modeling the interaction between proteins and ligands and accurately predicting their binding structures is a critical yet challenging task in drug discovery. Recent advancements in deep learning have shown promise in addressing this challenge, with sampling-based and regression-based methods emerging as two prominent approaches. However, these methods have notable limitations. Sampling-based methods often suffer from low efficiency due to the need for generating multiple candidate structures for selection. On the other hand, regression-based methods offer fast predictions but may experience decreased accuracy. Additionally, the variation in protein sizes often requires external modules for selecting suitable binding pockets, further impacting efficiency. In this work, we propose $\mathbf{FABind}$, an end-to-end model that combines pocket prediction and docking to achieve accurate and fast protein-ligand binding. $\mathbf{FABind}$ incorporates a unique ligand-informed pocket prediction module, which is also leveraged for docking pose estimation. The model further enhances the docking process by incrementally integrating the predicted pocket to optimize protein-ligand binding, reducing discrepancies between training and inference. Through extensive experiments on benchmark datasets, our proposed $\mathbf{FABind}$ demonstrates strong advantages in terms of effectiveness and efficiency compared to existing methods. Our code is available at $\href{https://github.com/QizhiPei/FABind}{Github}$.  ( 2 min )
    On Extreme Value Asymptotics of Projected Sample Covariances in High Dimensions with Applications in Finance and Convolutional Networks. (arXiv:2310.08150v1 [math.ST])
    Maximum-type statistics of certain functions of the sample covariance matrix of high-dimensional vector time series are studied to statistically confirm or reject the null hypothesis that a data set has been collected under normal conditions. The approach generalizes the case of the maximal deviation of the sample autocovariances function from its assumed values. Within a linear time series framework it is shown that Gumbel-type extreme value asymptotics holds true. As applications we discuss long-only mimimal-variance portfolio optimization and subportfolio analysis with respect to idiosyncratic risks, ETF index tracking by sparse tracking portfolios, convolutional deep learners for image analysis and the analysis of array-of-sensors data.  ( 2 min )
    GRASP: Accelerating Shortest Path Attacks via Graph Attention. (arXiv:2310.07980v1 [cs.LG])
    Recent advances in machine learning (ML) have shown promise in aiding and accelerating classical combinatorial optimization algorithms. ML-based speed ups that aim to learn in an end to end manner (i.e., directly output the solution) tend to trade off run time with solution quality. Therefore, solutions that are able to accelerate existing solvers while maintaining their performance guarantees, are of great interest. We consider an APX-hard problem, where an adversary aims to attack shortest paths in a graph by removing the minimum number of edges. We propose the GRASP algorithm: Graph Attention Accelerated Shortest Path Attack, an ML aided optimization algorithm that achieves run times up to 10x faster, while maintaining the quality of solution generated. GRASP uses a graph attention network to identify a smaller subgraph containing the combinatorial solution, thus effectively reducing the input problem size. Additionally, we demonstrate how careful representation of the input graph, including node features that correlate well with the optimization task, can highlight important structure in the optimization solution.  ( 2 min )
    GenTKG: Generative Forecasting on Temporal Knowledge Graph. (arXiv:2310.07793v1 [cs.CL])
    The rapid advancements in large language models (LLMs) have ignited interest in the temporal knowledge graph (tKG) domain, where conventional carefully designed embedding-based and rule-based models dominate. The question remains open of whether pre-trained LLMs can understand structured temporal relational data and replace them as the foundation model for temporal relational forecasting. Therefore, we bring temporal knowledge forecasting into the generative setting. However, challenges occur in the huge chasms between complex temporal graph data structure and sequential natural expressions LLMs can handle, and between the enormous data sizes of tKGs and heavy computation costs of finetuning LLMs. To address these challenges, we propose a novel retrieval augmented generation framework that performs generative forecasting on tKGs named GenTKG, which combines a temporal logical rule-based retrieval strategy and lightweight parameter-efficient instruction tuning. Extensive experiments have shown that GenTKG outperforms conventional methods of temporal relational forecasting under low computation resources. GenTKG also highlights remarkable transferability with exceeding performance on unseen datasets without re-training. Our work reveals the huge potential of LLMs in the tKG domain and opens a new frontier for generative forecasting on tKGs.  ( 2 min )
    Neural Combinatorial Optimization with Heavy Decoder: Toward Large Scale Generalization. (arXiv:2310.07985v1 [cs.LG])
    Neural combinatorial optimization (NCO) is a promising learning-based approach for solving challenging combinatorial optimization problems without specialized algorithm design by experts. However, most constructive NCO methods cannot solve problems with large-scale instance sizes, which significantly diminishes their usefulness for real-world applications. In this work, we propose a novel Light Encoder and Heavy Decoder (LEHD) model with a strong generalization ability to address this critical issue. The LEHD model can learn to dynamically capture the relationships between all available nodes of varying sizes, which is beneficial for model generalization to problems of various scales. Moreover, we develop a data-efficient training scheme and a flexible solution construction mechanism for the proposed LEHD model. By training on small-scale problem instances, the LEHD model can generate nearly optimal solutions for the Travelling Salesman Problem (TSP) and the Capacitated Vehicle Routing Problem (CVRP) with up to 1000 nodes, and also generalizes well to solve real-world TSPLib and CVRPLib problems. These results confirm our proposed LEHD model can significantly improve the state-of-the-art performance for constructive NCO. The code is available at https://github.com/CIAM-Group/NCO_code/tree/main/single_objective/LEHD.  ( 2 min )
    Variational operator learning: A unified paradigm marrying training neural operators and solving partial differential equations. (arXiv:2304.04234v2 [cs.LG] UPDATED)
    Neural operators as novel neural architectures for fast approximating solution operators of partial differential equations (PDEs), have shown considerable promise for future scientific computing. However, the mainstream of training neural operators is still data-driven, which needs an expensive ground-truth dataset from various sources (e.g., solving PDEs' samples with the conventional solvers, real-world experiments) in addition to training stage costs. From a computational perspective, marrying operator learning and specific domain knowledge to solve PDEs is an essential step in reducing dataset costs and label-free learning. We propose a novel paradigm that provides a unified framework of training neural operators and solving PDEs with the variational form, which we refer to as the variational operator learning (VOL). Ritz and Galerkin approach with finite element discretization are developed for VOL to achieve matrix-free approximation of system functional and residual, then direct minimization and iterative update are proposed as two optimization strategies for VOL. Various types of experiments based on reasonable benchmarks about variable heat source, Darcy flow, and variable stiffness elasticity are conducted to demonstrate the effectiveness of VOL. With a label-free training set and a 5-label-only shift set, VOL learns solution operators with its test errors decreasing in a power law with respect to the amount of unlabeled data. To the best of the authors' knowledge, this is the first study that integrates the perspectives of the weak form and efficient iterative methods for solving sparse linear systems into the end-to-end operator learning task.  ( 3 min )
    Diffusion-based Generative AI for Exploring Transition States from 2D Molecular Graphs. (arXiv:2304.12233v3 [physics.chem-ph] UPDATED)
    The exploration of transition state (TS) geometries is crucial for elucidating chemical reaction mechanisms and modeling their kinetics. Recently, machine learning (ML) models have shown remarkable performance for prediction of TS geometries. However, they require 3D conformations of reactants and products often with their appropriate orientations as input, which demands substantial efforts and computational cost. Here, we propose a generative approach based on the stochastic diffusion method, namely TSDiff, for prediction of TS geometries just from 2D molecular graphs. TSDiff outperformed the existing ML models with 3D geometries in terms of both accuracy and efficiency. Moreover, it enables to sample various TS conformations, because it learned the distribution of TS geometries for diverse reactions in training. Thus, TSDiff was able to find more favorable reaction pathways with lower barrier heights than those in the reference database. These results demonstrate that TSDiff shows promising potential for an efficient and reliable TS exploration.  ( 2 min )
    LLMMaps -- A Visual Metaphor for Stratified Evaluation of Large Language Models. (arXiv:2304.00457v3 [cs.CL] UPDATED)
    Large Language Models (LLMs) have revolutionized natural language processing and demonstrated impressive capabilities in various tasks. Unfortunately, they are prone to hallucinations, where the model exposes incorrect or false information in its responses, which renders diligent evaluation approaches mandatory. While LLM performance in specific knowledge fields is often evaluated based on question and answer (Q&A) datasets, such evaluations usually report only a single accuracy number for the dataset, which often covers an entire field. This field-based evaluation, is problematic with respect to transparency and model improvement. A stratified evaluation could instead reveal subfields, where hallucinations are more likely to occur and thus help to better assess LLMs' risks and guide their further development. To support such stratified evaluations, we propose LLMMaps as a novel visualization technique that enables users to evaluate LLMs' performance with respect to Q&A datasets. LLMMaps provide detailed insights into LLMs' knowledge capabilities in different subfields, by transforming Q&A datasets as well as LLM responses into an internal knowledge structure. An extension for comparative visualization furthermore, allows for the detailed comparison of multiple LLMs. To assess LLMMaps we use them to conduct a comparative analysis of several state-of-the-art LLMs, such as BLOOM, GPT-2, GPT-3, ChatGPT and LLaMa-13B, as well as two qualitative user evaluations. All necessary source code and data for generating LLMMaps to be used in scientific publications and elsewhere is available on GitHub: https://github.com/viscom-ulm/LLMMaps  ( 3 min )
    A general framework for multi-step ahead adaptive conformal heteroscedastic time series forecasting. (arXiv:2207.14219v9 [stat.ML] UPDATED)
    This paper introduces a novel model-agnostic algorithm called adaptive ensemble batch multi-input multi-output conformalized quantile regression (AEnbMIMOCQR} that enables forecasters to generate multi-step ahead prediction intervals for a fixed pre-specified miscoverage rate in a distribution-free manner. Our method is grounded on conformal prediction principles, however, it does not require data splitting and provides close to exact coverage even when the data is not exchangeable. Moreover, the resulting prediction intervals, besides being empirically valid along the forecast horizon, do not neglect heteroscedasticity. AEnbMIMOCQR is designed to be robust to distribution shifts, which means that its prediction intervals remain reliable over an unlimited period of time, without entailing retraining or imposing unrealistic strict assumptions on the data-generating process. Through methodically experimentation, we demonstrate that our approach outperforms other competitive methods on both real-world and synthetic datasets. The code used in the experimental part and a tutorial on how to use AEnbMIMOCQR can be found at the following GitHub repository: https://github.com/Quilograma/AEnbMIMOCQR.  ( 3 min )
    Conditional Mutual Information for Disentangled Representations in Reinforcement Learning. (arXiv:2305.14133v2 [cs.LG] UPDATED)
    Reinforcement Learning (RL) environments can produce training data with spurious correlations between features due to the amount of training data or its limited feature coverage. This can lead to RL agents encoding these misleading correlations in their latent representation, preventing the agent from generalising if the correlation changes within the environment or when deployed in the real world. Disentangled representations can improve robustness, but existing disentanglement techniques that minimise mutual information between features require independent features, thus they cannot disentangle correlated features. We propose an auxiliary task for RL algorithms that learns a disentangled representation of high-dimensional observations with correlated features by minimising the conditional mutual information between features in the representation. We demonstrate experimentally, using continuous control tasks, that our approach improves generalisation under correlation shifts, as well as improving the training performance of RL algorithms in the presence of correlated features.  ( 2 min )
    Bengali Document Layout Analysis -- A YOLOV8 Based Ensembling Approach. (arXiv:2309.00848v2 [cs.CV] UPDATED)
    This paper focuses on enhancing Bengali Document Layout Analysis (DLA) using the YOLOv8 model and innovative post-processing techniques. We tackle challenges unique to the complex Bengali script by employing data augmentation for model robustness. After meticulous validation set evaluation, we fine-tune our approach on the complete dataset, leading to a two-stage prediction strategy for accurate element segmentation. Our ensemble model, combined with post-processing, outperforms individual base architectures, addressing issues identified in the BaDLAD dataset. By leveraging this approach, we aim to advance Bengali document analysis, contributing to improved OCR and document comprehension and BaDLAD serves as a foundational resource for this endeavor, aiding future research in the field. Furthermore, our experiments provided key insights to incorporate new strategies into the established solution.
    Flood and Echo: Algorithmic Alignment of GNNs with Distributed Computing. (arXiv:2310.06970v2 [cs.LG] UPDATED)
    Graph Neural Networks are a natural fit for learning algorithms. They can directly represent tasks through an abstract but versatile graph structure and handle inputs of different sizes. This opens up the possibility for scaling and extrapolation to larger graphs, one of the most important advantages of an algorithm. However, this raises two core questions i) How can we enable nodes to gather the required information in a given graph ($\textit{information exchange}$), even if is far away and ii) How can we design an execution framework which enables this information exchange for extrapolation to larger graph sizes ($\textit{algorithmic alignment for extrapolation}$). We propose a new execution framework that is inspired by the design principles of distributed algorithms: Flood and Echo Net. It propagates messages through the entire graph in a wave like activation pattern, which naturally generalizes to larger instances. Through its sparse but parallel activations it is provably more efficient in terms of message complexity. We study the proposed model and provide both empirical evidence and theoretical insights in terms of its expressiveness, efficiency, information exchange and ability to extrapolate.
    WiGenAI: The Symphony of Wireless and Generative AI via Diffusion Models. (arXiv:2310.07312v2 [cs.IT] UPDATED)
    Innovative foundation models, such as GPT-3 and stable diffusion models, have made a paradigm shift in the realm of artificial intelligence (AI) towards generative AI-based systems. In unison, from data communication and networking perspective, AI and machine learning (AI/ML) algorithms are envisioned to be pervasively incorporated into the future generations of wireless communications systems, highlighting the need for novel AI-native solutions for the emergent communication scenarios. In this article, we outline the applications of generative AI in wireless communication systems to lay the foundations for research in this field. Diffusion-based generative models, as the new state-of-the-art paradigm of generative models, are introduced, and their applications in wireless communication systems are discussed. Two case studies are also presented to showcase how diffusion models can be exploited for the development of resilient AI-native communication systems. Specifically, we propose denoising diffusion probabilistic models (DDPM) for a wireless communication scheme with non-ideal transceivers, where 30% improvement is achieved in terms of bit error rate. As the second application, DDPMs are employed at the transmitter to shape the constellation symbols, highlighting a robust out-of-distribution performance. Finally, future directions and open issues for the development of generative AI-based wireless systems are discussed to promote future research endeavors towards wireless generative AI (WiGenAI).
    OWAdapt: An adaptive loss function for deep learning using OWA operators. (arXiv:2305.19443v2 [cs.LG] UPDATED)
    In this paper, we propose a fuzzy adaptive loss function for enhancing deep learning performance in classification tasks. Specifically, we redefine the cross-entropy loss to effectively address class-level noise conditions, including the challenging problem of class imbalance. Our approach introduces aggregation operators, leveraging the power of fuzzy logic to improve classification accuracy. The rationale behind our proposed method lies in the iterative up-weighting of class-level components within the loss function, focusing on those with larger errors. To achieve this, we employ the ordered weighted average (OWA) operator and combine it with an adaptive scheme for gradient-based learning. Through extensive experimentation, our method outperforms other commonly used loss functions, such as the standard cross-entropy or focal loss, across various binary and multiclass classification tasks. Furthermore, we explore the influence of hyperparameters associated with the OWA operators and present a default configuration that performs well across different experimental settings.  ( 2 min )
    ImageNomer: description of a functional connectivity and omics analysis tool and case study identifying a race confound. (arXiv:2302.00767v2 [q-bio.PE] UPDATED)
    Most packages for the analysis of fMRI-based functional connectivity (FC) and genomic data are used with a programming language interface, lacking an easy-to-navigate GUI frontend. This exacerbates two problems found in these types of data: demographic confounds and quality control in the face of high dimensionality of features. The reason is that it is too slow and cumbersome to use a programming interface to create all the necessary visualizations required to identify all correlations, confounding effects, or quality control problems in a dataset. To remedy this situation, we have developed ImageNomer, a data visualization and analysis tool that allows inspection of both subject-level and cohort-level demographic, genomic, and imaging features. The software is Python-based, runs in a self-contained Docker image, and contains a browser-based GUI frontend. We demonstrate the usefulness of ImageNomer by identifying an unexpected race confound when predicting achievement scores in the Philadelphia Neurodevelopmental Cohort (PNC) dataset. In the past, many studies have attempted to use FC to identify achievement-related features in fMRI. Using ImageNomer, we find a clear potential for confounding effects of race. Using correlation analysis in the ImageNomer software, we show that FCs correlated with Wide Range Achievement Test (WRAT) score are in fact more highly correlated with race. Investigating further, we find that whereas both FC and SNP (genomic) features can account for 10-15\% of WRAT score variation, this predictive ability disappears when controlling for race. In this work, we demonstrate the advantage of our ImageNomer GUI tool in data exploration and confound detection. Additionally, this work identifies race as a strong confound in FC data and casts doubt on the possibility of finding unbiased achievement-related features in fMRI and SNP data of healthy adolescents.
    DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies. (arXiv:2310.04610v2 [cs.AI] UPDATED)
    In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique capabilities through AI system technology innovations to help domain experts to unlock today's biggest science mysteries. By leveraging DeepSpeed's current technology pillars (training, inference and compression) as base technology enablers, DeepSpeed4Science will create a new set of AI system technologies tailored for accelerating scientific discoveries by addressing their unique complexity beyond the common technical approaches used for accelerating generic large language models (LLMs). In this paper, we showcase the early progress we made with DeepSpeed4Science in addressing two of the critical system challenges in structural biology research.
    Conditional Sig-Wasserstein GANs for Time Series Generation. (arXiv:2006.05421v2 [cs.LG] UPDATED)
    Generative adversarial networks (GANs) have been extremely successful in generating samples, from seemingly high dimensional probability measures. However, these methods struggle to capture the temporal dependence of joint probability distributions induced by time-series data. Furthermore, long time-series data streams hugely increase the dimension of the target space, which may render generative modelling infeasible. To overcome these challenges, motivated by the autoregressive models in econometric, we are interested in the conditional distribution of future time series given the past information. We propose the generic conditional Sig-WGAN framework by integrating Wasserstein-GANs (WGANs) with mathematically principled and efficient path feature extraction called the signature of a path. The signature of a path is a graded sequence of statistics that provides a universal description for a stream of data, and its expected value characterises the law of the time-series model. In particular, we develop the conditional Sig-$W_1$ metric, that captures the conditional joint law of time series models, and use it as a discriminator. The signature feature space enables the explicit representation of the proposed discriminators which alleviates the need for expensive training. We validate our method on both synthetic and empirical dataset and observe that our method consistently and significantly outperforms state-of-the-art benchmarks with respect to measures of similarity and predictive ability.  ( 3 min )
    Smoothed $f$-Divergence Distributionally Robust Optimization. (arXiv:2306.14041v2 [math.OC] UPDATED)
    In data-driven optimization, sample average approximation (SAA) is known to suffer from the so-called optimizer's curse that causes an over-optimistic evaluation of the solution performance. We argue that a special type of distributionallly robust optimization (DRO) formulation offers theoretical advantages in correcting for this optimizer's curse compared to simple ``margin'' adjustments to SAA and other DRO approaches: It attains a statistical bound on the out-of-sample performance, for a wide class of objective functions and distributions, that is nearly tightest in terms of exponential decay rate. This DRO uses an ambiguity set based on a Kullback Leibler (KL) divergence smoothed by the Wasserstein or L\'evy-Prokhorov (LP) distance via a suitable distance optimization. Computationally, we also show that such a DRO, and its generalized versions using smoothed $f$-divergence, are not harder than DRO problems based on $f$-divergence or Wasserstein distances, rendering our DRO formulations both statistically optimal and computationally viable.  ( 2 min )
    Exploring the Relationship Between Model Architecture and In-Context Learning Ability. (arXiv:2310.08049v1 [cs.LG])
    What is the relationship between model architecture and the ability to perform in-context learning? In this empirical study, we take the first steps towards answering this question. In particular, we evaluate fifteen model architectures across a suite of synthetic in-context learning tasks. The selected architectures represent a broad range of paradigms, including recurrent and convolution-based neural networks, transformers, and emerging attention alternatives. We discover that all considered architectures can perform in-context learning under certain conditions. However, contemporary architectures are found to be the best performing, especially as task complexity grows. Additionally, our follow-up experiments delve into various factors that influence in-context learning. We observe varied sensitivities among architectures with respect to hyperparameter settings. Our study of training dynamics reveals that certain architectures exhibit a smooth, progressive learning trajectory, while others demonstrate periods of stagnation followed by abrupt mastery of the task. Finally, and somewhat surprisingly, we find that several emerging attention alternatives are more robust in-context learners than transformers; since such approaches have constant-sized memory footprints at inference time, this result opens the future possibility of scaling up in-context learning to vastly larger numbers of in-context examples.
    BarlowRL: Barlow Twins for Data-Efficient Reinforcement Learning. (arXiv:2308.04263v3 [cs.LG] UPDATED)
    This paper introduces BarlowRL, a data-efficient reinforcement learning agent that combines the Barlow Twins self-supervised learning framework with DER (Data-Efficient Rainbow) algorithm. BarlowRL outperforms both DER and its contrastive counterpart CURL on the Atari 100k benchmark. BarlowRL avoids dimensional collapse by enforcing information spread to the whole space. This helps RL algorithms to utilize uniformly spread state representation that eventually results in a remarkable performance. The integration of Barlow Twins with DER enhances data efficiency and achieves superior performance in the RL tasks. BarlowRL demonstrates the potential of incorporating self-supervised learning techniques to improve RL algorithms.
    Network Synthetic Interventions: A Causal Framework for Panel Data Under Network Interference. (arXiv:2210.11355v2 [econ.EM] UPDATED)
    We propose a generalization of the synthetic controls and synthetic interventions methodology to incorporate network interference. We consider the estimation of unit-specific potential outcomes from panel data in the presence of spillover across units and unobserved confounding. Key to our approach is a novel latent factor model that takes into account network interference and generalizes the factor models typically used in panel data settings. We propose an estimator, Network Synthetic Interventions (NSI), and show that it consistently estimates the mean outcomes for a unit under an arbitrary set of counterfactual treatments for the network. We further establish that the estimator is asymptotically normal. We furnish two validity tests for whether the NSI estimator reliably generalizes to produce accurate counterfactual estimates. We provide a novel graph-based experiment design that guarantees the NSI estimator produces accurate counterfactual estimates, and also analyze the sample complexity of the proposed design. We conclude with simulations that corroborate our theoretical findings.
    Towards Data-and Knowledge-Driven Artificial Intelligence: A Survey on Neuro-Symbolic Computing. (arXiv:2210.15889v4 [cs.AI] UPDATED)
    Neural-symbolic computing (NeSy), which pursues the integration of the symbolic and statistical paradigms of cognition, has been an active research area of Artificial Intelligence (AI) for many years. As NeSy shows promise of reconciling the advantages of reasoning and interpretability of symbolic representation and robust learning in neural networks, it may serve as a catalyst for the next generation of AI. In the present paper, we provide a systematic overview of the recent developments and important contributions of NeSy research. Firstly, we introduce study history of this area, covering early work and foundations. We further discuss background concepts and identify key driving factors behind the development of NeSy. Afterward, we categorize recent landmark approaches along several main characteristics that underline this research paradigm, including neural-symbolic integration, knowledge representation, knowledge embedding, and functionality. Next, we briefly discuss the successful application of modern NeSy approaches in several domains. Then, we benchmark several NeSy methods on three representative application tasks. Finally, we identify the open problems together with potential future research directions. This survey is expected to help new researchers enter this rapidly evolving field and accelerate the progress towards data-and knowledge-driven AI.  ( 2 min )
    GePSAn: Generative Procedure Step Anticipation in Cooking Videos. (arXiv:2310.08312v1 [cs.CV])
    We study the problem of future step anticipation in procedural videos. Given a video of an ongoing procedural activity, we predict a plausible next procedure step described in rich natural language. While most previous work focus on the problem of data scarcity in procedural video datasets, another core challenge of future anticipation is how to account for multiple plausible future realizations in natural settings. This problem has been largely overlooked in previous work. To address this challenge, we frame future step prediction as modelling the distribution of all possible candidates for the next step. Specifically, we design a generative model that takes a series of video clips as input, and generates multiple plausible and diverse candidates (in natural language) for the next step. Following previous work, we side-step the video annotation scarcity by pretraining our model on a large text-based corpus of procedural activities, and then transfer the model to the video domain. Our experiments, both in textual and video domains, show that our model captures diversity in the next step prediction and generates multiple plausible future predictions. Moreover, our model establishes new state-of-the-art results on YouCookII, where it outperforms existing baselines on the next step anticipation. Finally, we also show that our model can successfully transfer from text to the video domain zero-shot, ie, without fine-tuning or adaptation, and produces good-quality future step predictions from video.
    GraphControl: Adding Conditional Control to Universal Graph Pre-trained Models for Graph Domain Transfer Learning. (arXiv:2310.07365v2 [cs.LG] UPDATED)
    Graph-structured data is ubiquitous in the world which models complex relationships between objects, enabling various Web applications. Daily influxes of unlabeled graph data on the Web offer immense potential for these applications. Graph self-supervised algorithms have achieved significant success in acquiring generic knowledge from abundant unlabeled graph data. These pre-trained models can be applied to various downstream Web applications, saving training time and improving downstream (target) performance. However, different graphs, even across seemingly similar domains, can differ significantly in terms of attribute semantics, posing difficulties, if not infeasibility, for transferring the pre-trained models to downstream tasks. Concretely speaking, for example, the additional task-specific node information in downstream tasks (specificity) is usually deliberately omitted so that the pre-trained representation (transferability) can be leveraged. The trade-off as such is termed as "transferability-specificity dilemma" in this work. To address this challenge, we introduce an innovative deployment module coined as GraphControl, motivated by ControlNet, to realize better graph domain transfer learning. Specifically, by leveraging universal structural pre-trained models and GraphControl, we align the input space across various graphs and incorporate unique characteristics of target data as conditional inputs. These conditions will be progressively integrated into the model during fine-tuning or prompt tuning through ControlNet, facilitating personalized deployment. Extensive experiments show that our method significantly enhances the adaptability of pre-trained models on target attributed datasets, achieving 1.4-3x performance gain. Furthermore, it outperforms training-from-scratch methods on target data with a comparable margin and exhibits faster convergence.
    Distilling Large Vision-Language Model with Out-of-Distribution Generalizability. (arXiv:2307.03135v3 [cs.CV] UPDATED)
    Large vision-language models have achieved outstanding performance, but their size and computational requirements make their deployment on resource-constrained devices and time-sensitive tasks impractical. Model distillation, the process of creating smaller, faster models that maintain the performance of larger models, is a promising direction towards the solution. This paper investigates the distillation of visual representations in large teacher vision-language models into lightweight student models using a small- or mid-scale dataset. Notably, this study focuses on open-vocabulary out-of-distribution (OOD) generalization, a challenging problem that has been overlooked in previous model distillation literature. We propose two principles from vision and language modality perspectives to enhance student's OOD generalization: (1) by better imitating teacher's visual representation space, and carefully promoting better coherence in vision-language alignment with the teacher; (2) by enriching the teacher's language representations with informative and finegrained semantic attributes to effectively distinguish between different labels. We propose several metrics and conduct extensive experiments to investigate their techniques. The results demonstrate significant improvements in zero-shot and few-shot student performance on open-vocabulary out-of-distribution classification, highlighting the effectiveness of our proposed approaches. Poster: https://xuanlinli17.github.io/pdfs/iccv23_large_vlm_distillation_poster.pdf Code: https://github.com/xuanlinli17/large_vlm_distillation_ood  ( 2 min )
    Imitation Learning from Observation with Automatic Discount Scheduling. (arXiv:2310.07433v2 [cs.RO] UPDATED)
    Humans often acquire new skills through observation and imitation. For robotic agents, learning from the plethora of unlabeled video demonstration data available on the Internet necessitates imitating the expert without access to its action, presenting a challenge known as Imitation Learning from Observations (ILfO). A common approach to tackle ILfO problems is to convert them into inverse reinforcement learning problems, utilizing a proxy reward computed from the agent's and the expert's observations. Nonetheless, we identify that tasks characterized by a progress dependency property pose significant challenges for such approaches; in these tasks, the agent needs to initially learn the expert's preceding behaviors before mastering the subsequent ones. Our investigation reveals that the main cause is that the reward signals assigned to later steps hinder the learning of initial behaviors. To address this challenge, we present a novel ILfO framework that enables the agent to master earlier behaviors before advancing to later ones. We introduce an Automatic Discount Scheduling (ADS) mechanism that adaptively alters the discount factor in reinforcement learning during the training phase, prioritizing earlier rewards initially and gradually engaging later rewards only when the earlier behaviors have been mastered. Our experiments, conducted on nine Meta-World tasks, demonstrate that our method significantly outperforms state-of-the-art methods across all tasks, including those that are unsolvable by them.
    PromptTTS 2: Describing and Generating Voices with Text Prompt. (arXiv:2309.02285v2 [eess.AS] UPDATED)
    Speech conveys more information than text, as the same word can be uttered in various voices to convey diverse information. Compared to traditional text-to-speech (TTS) methods relying on speech prompts (reference speech) for voice variability, using text prompts (descriptions) is more user-friendly since speech prompts can be hard to find or may not exist at all. TTS approaches based on the text prompt face two main challenges: 1) the one-to-many problem, where not all details about voice variability can be described in the text prompt, and 2) the limited availability of text prompt datasets, where vendors and large cost of data labeling are required to write text prompts for speech. In this work, we introduce PromptTTS 2 to address these challenges with a variation network to provide variability information of voice not captured by text prompts, and a prompt generation pipeline to utilize the large language models (LLM) to compose high quality text prompts. Specifically, the variation network predicts the representation extracted from the reference speech (which contains full information about voice variability) based on the text prompt representation. For the prompt generation pipeline, it generates text prompts for speech with a speech language understanding model to recognize voice attributes (e.g., gender, speed) from speech and a large language model to formulate text prompts based on the recognition results. Experiments on a large-scale (44K hours) speech dataset demonstrate that compared to the previous works, PromptTTS 2 generates voices more consistent with text prompts and supports the sampling of diverse voice variability, thereby offering users more choices on voice generation. Additionally, the prompt generation pipeline produces high-quality text prompts, eliminating the large labeling cost. The demo page of PromptTTS 2 is available online.
    A Neural-preconditioned Poisson Solver for Mixed Dirichlet and Neumann Boundary Conditions. (arXiv:2310.00177v3 [math.NA] UPDATED)
    We introduce a neural-preconditioned iterative solver for Poisson equations with mixed boundary conditions. The Poisson equation is ubiquitous in scientific computing: it governs a wide array of physical phenomena, arises as a subproblem in many numerical algorithms, and serves as a model problem for the broader class of elliptic PDEs. The most popular Poisson discretizations yield large sparse linear systems. At high resolution, and for performance-critical applications, iterative solvers can be advantageous for these -- but only when paired with powerful preconditioners. The core of our solver is a neural network trained to approximate the inverse of a discrete structured-grid Laplace operator for a domain of arbitrary shape and with mixed boundary conditions. The structure of this problem motivates a novel network architecture that we demonstrate is highly effective as a preconditioner even for boundary conditions outside the training set. We show that on challenging test cases arising from an incompressible fluid simulation, our method outperforms state-of-the-art solvers like algebraic multigrid as well as some recent neural preconditioners.
    TEMPO: Prompt-based Generative Pre-trained Transformer for Time Series Forecasting. (arXiv:2310.04948v2 [cs.LG] UPDATED)
    The past decade has witnessed significant advances in time series modeling with deep learning. While achieving state-of-the-art results, the best-performing architectures vary highly across applications and domains. Meanwhile, for natural language processing, the Generative Pre-trained Transformer (GPT) has demonstrated impressive performance via training one general-purpose model across various textual datasets. It is intriguing to explore whether GPT-type architectures can be effective for time series, capturing the intrinsic dynamic attributes and leading to significant accuracy improvements. In this paper, we propose a novel framework, TEMPO, that can effectively learn time series representations. We focus on utilizing two essential inductive biases of the time series task for pre-trained models: (i) decomposition of the complex interaction between trend, seasonal and residual components; and (ii) introducing the selection-based prompts to facilitate distribution adaptation in non-stationary time series. TEMPO expands the capability for dynamically modeling real-world temporal phenomena from data within diverse domains. Our experiments demonstrate the superior performance of TEMPO over state-of-the-art methods on a number of time series benchmark datasets. This performance gain is observed not only in standard supervised learning settings but also in scenarios involving previously unseen datasets as well as in scenarios with multi-modal inputs. This compelling finding highlights TEMPO's potential to constitute a foundational model-building framework.
    Efficient probabilistic reconciliation of forecasts for real-valued and count time series. (arXiv:2210.02286v3 [stat.ML] UPDATED)
    Hierarchical time series are common in several applied fields. The forecasts for these time series are required to be coherent, that is, to satisfy the constraints given by the hierarchy. The most popular technique to enforce coherence is called reconciliation, which adjusts the base forecasts computed for each time series. However, recent works on probabilistic reconciliation present several limitations. In this paper, we propose a new approach based on conditioning to reconcile any type of forecast distribution. We then introduce a new algorithm, called Bottom-Up Importance Sampling, to efficiently sample from the reconciled distribution. It can be used for any base forecast distribution: discrete, continuous, or in the form of samples, providing a major speedup compared to the current methods. Experiments on several temporal hierarchies show a significant improvement over base probabilistic forecasts.  ( 2 min )
    Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts. (arXiv:2310.05898v2 [cs.LG] UPDATED)
    Lion (Evolved Sign Momentum), a new optimizer discovered through program search, has shown promising results in training large AI models. It performs comparably or favorably to AdamW but with greater memory efficiency. As we can expect from the results of a random search program, Lion incorporates elements from several existing algorithms, including signed momentum, decoupled weight decay, Polak, and Nesterov momentum, but does not fit into any existing category of theoretically grounded optimizers. Thus, even though Lion appears to perform well as a general-purpose optimizer for a wide range of tasks, its theoretical basis remains uncertain. This lack of theoretical clarity limits opportunities to further enhance and expand Lion's efficacy. This work aims to demystify Lion. Based on both continuous-time and discrete-time analysis, we demonstrate that Lion is a theoretically novel and principled approach for minimizing a general loss function $f(x)$ while enforcing a bound constraint $\|x\|_\infty \leq 1/\lambda$. Lion achieves this through the incorporation of decoupled weight decay, where $\lambda$ represents the weight decay coefficient. Our analysis is made possible by the development of a new Lyapunov function for the Lion updates. It applies to a broader family of Lion-$\kappa$ algorithms, where the $\text{sign}(\cdot)$ operator in Lion is replaced by the subgradient of a convex function $\kappa$, leading to the solution of a general composite optimization problem of $\min_x f(x) + \kappa^*(x)$. Our findings provide valuable insights into the dynamics of Lion and pave the way for further improvements and extensions of Lion-related algorithms.
    Clustering Three-Way Data with Outliers. (arXiv:2310.05288v2 [stat.ML] UPDATED)
    Matrix-variate distributions are a recent addition to the model-based clustering field, thereby making it possible to analyze data in matrix form with complex structure such as images and time series. Due to its recent appearance, there is limited literature on matrix-variate data, with even less on dealing with outliers in these models. An approach for clustering matrix-variate normal data with outliers is discussed. The approach, which uses the distribution of subset log-likelihoods, extends the OCLUST algorithm to matrix-variate normal data and uses an iterative approach to detect and trim outliers.
    Federated Generalization via Information-Theoretic Distribution Diversification. (arXiv:2310.07171v2 [cs.LG] UPDATED)
    Federated Learning (FL) has surged in prominence due to its capability of collaborative model training without direct data sharing. However, the vast disparity in local data distributions among clients, often termed the non-Independent Identically Distributed (non-IID) challenge, poses a significant hurdle to FL's generalization efficacy. The scenario becomes even more complex when not all clients participate in the training process, a common occurrence due to unstable network connections or limited computational capacities. This can greatly complicate the assessment of the trained models' generalization abilities. While a plethora of recent studies has centered on the generalization gap pertaining to unseen data from participating clients with diverse distributions, the divergence between the training distributions of participating clients and the testing distributions of non-participating ones has been largely overlooked. In response, our paper unveils an information-theoretic generalization framework for FL. Specifically, it quantifies generalization errors by evaluating the information entropy of local distributions and discerning discrepancies across these distributions. Inspired by our deduced generalization bounds, we introduce a weighted aggregation approach and a duo of client selection strategies. These innovations aim to bolster FL's generalization prowess by encompassing a more varied set of client data distributions. Our extensive empirical evaluations reaffirm the potency of our proposed methods, aligning seamlessly with our theoretical construct.
    SpikeCLIP: A Contrastive Language-Image Pretrained Spiking Neural Network. (arXiv:2310.06488v2 [cs.NE] UPDATED)
    Spiking neural networks (SNNs) have demonstrated the capability to achieve comparable performance to deep neural networks (DNNs) in both visual and linguistic domains while offering the advantages of improved energy efficiency and adherence to biological plausibility. However, the extension of such single-modality SNNs into the realm of multimodal scenarios remains an unexplored territory. Drawing inspiration from the concept of contrastive language-image pre-training (CLIP), we introduce a novel framework, named SpikeCLIP, to address the gap between two modalities within the context of spike-based computing through a two-step recipe involving ``Alignment Pre-training + Dual-Loss Fine-tuning". Extensive experiments demonstrate that SNNs achieve comparable results to their DNN counterparts while significantly reducing energy consumption across a variety of datasets commonly used for multimodal model evaluation. Furthermore, SpikeCLIP maintains robust performance in image classification tasks that involve class labels not predefined within specific categories.
    GP-net: Flexible Viewpoint Grasp Proposal. (arXiv:2209.10404v3 [cs.RO] UPDATED)
    We present the Grasp Proposal Network (GP-net), a Convolutional Neural Network model which can generate 6-DoF grasps from flexible viewpoints, e.g. as experienced by mobile manipulators. To train GP-net, we synthetically generate a dataset containing depth-images and ground-truth grasp information. In real-world experiments, we use the EGAD evaluation benchmark to evaluate GP-net against two commonly used algorithms, the Volumetric Grasping Network (VGN) and the Grasp Pose Detection package (GPD), on a PAL TIAGo mobile manipulator. In contrast to the state-of-the-art methods in robotic grasping, GP-net can be used for grasping objects from flexible, unknown viewpoints without the need to define the workspace and achieves a grasp success of 54.4% compared to 51.6% for VGN and 44.2% for GPD. We provide a ROS package along with our code and pre-trained models at https://aucoroboticsmu.github.io/GP-net/.
    FedSym: Unleashing the Power of Entropy for Benchmarking the Algorithms for Federated Learning. (arXiv:2310.07807v1 [cs.LG])
    Federated learning (FL) is a decentralized machine learning approach where independent learners process data privately. Its goal is to create a robust and accurate model by aggregating and retraining local models over multiple rounds. However, FL faces challenges regarding data heterogeneity and model aggregation effectiveness. In order to simulate real-world data, researchers use methods for data partitioning that transform a dataset designated for centralized learning into a group of sub-datasets suitable for distributed machine learning with different data heterogeneity. In this paper, we study the currently popular data partitioning techniques and visualize their main disadvantages: the lack of precision in the data diversity, which leads to unreliable heterogeneity indexes, and the inability to incrementally challenge the FL algorithms. To resolve this problem, we propose a method that leverages entropy and symmetry to construct 'the most challenging' and controllable data distributions with gradual difficulty. We introduce a metric to measure data heterogeneity among the learning agents and a transformation technique that divides any dataset into splits with precise data diversity. Through a comparative study, we demonstrate the superiority of our method over existing FL data partitioning approaches, showcasing its potential to challenge model aggregation algorithms. Experimental results indicate that our approach gradually challenges the FL strategies, and the models trained on FedSym distributions are more distinct.
    GPT-4 as an Agronomist Assistant? Answering Agriculture Exams Using Large Language Models. (arXiv:2310.06225v2 [cs.AI] UPDATED)
    Large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding across various domains, including healthcare and finance. For some tasks, LLMs achieve similar or better performance than trained human beings, therefore it is reasonable to employ human exams (e.g., certification tests) to assess the performance of LLMs. We present a comprehensive evaluation of popular LLMs, such as Llama 2 and GPT, on their ability to answer agriculture-related questions. In our evaluation, we also employ RAG (Retrieval-Augmented Generation) and ER (Ensemble Refinement) techniques, which combine information retrieval, generation capabilities, and prompting strategies to improve the LLMs' performance. To demonstrate the capabilities of LLMs, we selected agriculture exams and benchmark datasets from three of the largest agriculture producer countries: Brazil, India, and the USA. Our analysis highlights GPT-4's ability to achieve a passing score on exams to earn credits for renewing agronomist certifications, answering 93% of the questions correctly and outperforming earlier general-purpose models, which achieved 88% accuracy. On one of our experiments, GPT-4 obtained the highest performance when compared to human subjects. This performance suggests that GPT-4 could potentially pass on major graduate education admission tests or even earn credits for renewing agronomy certificates. We also explore the models' capacity to address general agriculture-related questions and generate crop management guidelines for Brazilian and Indian farmers, utilizing robust datasets from the Brazilian Agency of Agriculture (Embrapa) and graduate program exams from India. The results suggest that GPT-4, ER, and RAG can contribute meaningfully to agricultural education, assessment, and crop management practice, offering valuable insights to farmers and agricultural professionals.
    NECO: NEural Collapse Based Out-of-distribution detection. (arXiv:2310.06823v2 [stat.ML] UPDATED)
    Detecting out-of-distribution (OOD) data is a critical challenge in machine learning due to model overconfidence, often without awareness of their epistemological limits. We hypothesize that ``neural collapse'', a phenomenon affecting in-distribution data for models trained beyond loss convergence, also influences OOD data. To benefit from this interplay, we introduce NECO, a novel post-hoc method for OOD detection, which leverages the geometric properties of ``neural collapse'' and of principal component spaces to identify OOD data. Our extensive experiments demonstrate that NECO achieves state-of-the-art results on both small and large-scale OOD detection tasks while exhibiting strong generalization capabilities across different network architectures. Furthermore, we provide a theoretical explanation for the effectiveness of our method in OOD detection. We plan to release the code after the anonymity period.
    Locality-Aware Generalizable Implicit Neural Representation. (arXiv:2310.05624v2 [cs.LG] UPDATED)
    Generalizable implicit neural representation (INR) enables a single continuous function, i.e., a coordinate-based neural network, to represent multiple data instances by modulating its weights or intermediate features using latent codes. However, the expressive power of the state-of-the-art modulation is limited due to its inability to localize and capture fine-grained details of data entities such as specific pixels and rays. To address this issue, we propose a novel framework for generalizable INR that combines a transformer encoder with a locality-aware INR decoder. The transformer encoder predicts a set of latent tokens from a data instance to encode local information into each latent token. The locality-aware INR decoder extracts a modulation vector by selectively aggregating the latent tokens via cross-attention for a coordinate input and then predicts the output by progressively decoding with coarse-to-fine modulation through multiple frequency bandwidths. The selective token aggregation and the multi-band feature modulation enable us to learn locality-aware representation in spatial and spectral aspects, respectively. Our framework significantly outperforms previous generalizable INRs and validates the usefulness of the locality-aware latents for downstream tasks such as image generation.
    Defending Our Privacy With Backdoors. (arXiv:2310.08320v1 [cs.LG])
    The proliferation of large AI models trained on uncurated, often sensitive web-scraped data has raised significant privacy concerns. One of the concerns is that adversaries can extract information about the training data using privacy attacks. Unfortunately, the task of removing specific information from the models without sacrificing performance is not straightforward and has proven to be challenging. We propose a rather easy yet effective defense based on backdoor attacks to remove private information such as names of individuals from models, and focus in this work on text encoders. Specifically, through strategic insertion of backdoors, we align the embeddings of sensitive phrases with those of neutral terms-"a person" instead of the person's name. Our empirical results demonstrate the effectiveness of our backdoor-based defense on CLIP by assessing its performance using a specialized privacy attack for zero-shot classifiers. Our approach provides not only a new "dual-use" perspective on backdoor attacks, but also presents a promising avenue to enhance the privacy of individuals within models trained on uncurated web-scraped data.
    On Regularized Sparse Logistic Regression. (arXiv:2309.05925v2 [cs.LG] UPDATED)
    Sparse logistic regression is for classification and feature selection simultaneously. Although many studies have been done to solve $\ell_1$-regularized logistic regression, there is no equivalently abundant work on solving sparse logistic regression with nonconvex regularization term. In this paper, we propose a unified framework to solve $\ell_1$-regularized logistic regression, which can be naturally extended to nonconvex regularization term, as long as certain requirement is satisfied. In addition, we also utilize a different line search criteria to guarantee monotone convergence for various regularization terms. Empirical experiments on binary classification tasks with real-world datasets demonstrate our proposed algorithms are capable of performing classification and feature selection effectively at a lower computational cost.
    Rethinking Negative Pairs in Code Search. (arXiv:2310.08069v1 [cs.SE])
    Recently, contrastive learning has become a key component in fine-tuning code search models for software development efficiency and effectiveness. It pulls together positive code snippets while pushing negative samples away given search queries. Among contrastive learning, InfoNCE is the most widely used loss function due to its better performance. However, the following problems in negative samples of InfoNCE may deteriorate its representation learning: 1) The existence of false negative samples in large code corpora due to duplications. 2). The failure to explicitly differentiate between the potential relevance of negative samples. As an example, a bubble sorting algorithm example is less ``negative'' than a file saving function for the quick sorting algorithm query. In this paper, we tackle the above problems by proposing a simple yet effective Soft-InfoNCE loss that inserts weight terms into InfoNCE. In our proposed loss function, we apply three methods to estimate the weights of negative pairs and show that the vanilla InfoNCE loss is a special case of Soft-InfoNCE. Theoretically, we analyze the effects of Soft-InfoNCE on controlling the distribution of learnt code representations and on deducing a more precise mutual information estimation. We furthermore discuss the superiority of proposed loss functions with other design alternatives. Extensive experiments demonstrate the effectiveness of Soft-InfoNCE and weights estimation methods under state-of-the-art code search models on a large-scale public dataset consisting of six programming languages. Source code is available at \url{https://github.com/Alex-HaochenLi/Soft-InfoNCE}.
    Learning Collaborative Information Dissemination with Graph-based Multi-Agent Reinforcement Learning. (arXiv:2308.16198v2 [cs.LG] UPDATED)
    In modern communication systems, efficient and reliable information dissemination is crucial for supporting critical operations across domains like disaster response, autonomous vehicles, and sensor networks. This paper introduces a Multi-Agent Reinforcement Learning (MARL) approach as a significant step forward in achieving more decentralized, efficient, and collaborative solutions. We propose a Partially Observable Stochastic Game (POSG) formulation for information dissemination empowering each agent to decide on message forwarding independently, based on their one-hop neighborhood. This constitutes a significant paradigm shift from traditional heuristics based on Multi-Point Relay (MPR) selection. Our approach harnesses Graph Convolutional Reinforcement Learning, employing Graph Attention Networks (GAT) with dynamic attention to capture essential network features. We propose two approaches, L-DGN and HL-DGN, which differ in the information that is exchanged among agents. We evaluate the performance of our decentralized approaches, by comparing them with a widely-used MPR heuristic, and we show that our trained policies are able to efficiently cover the network while bypassing the MPR set selection process. Our approach is a first step toward supporting the resilience of real-world broadcast communication infrastructures via learned, collaborative information dissemination.
    Nest-DGIL: Nesterov-optimized Deep Geometric Incremental Learning for CS Image Reconstruction. (arXiv:2308.03807v2 [eess.IV] UPDATED)
    Proximal gradient-based optimization is one of the most common strategies to solve inverse problem of images, and it is easy to implement. However, these techniques often generate heavy artifacts in image reconstruction. One of the most popular refinement methods is to fine-tune the regularization parameter to alleviate such artifacts, but it may not always be sufficient or applicable due to increased computational costs. In this work, we propose a deep geometric incremental learning framework based on the second Nesterov proximal gradient optimization. The proposed end-to-end network not only has the powerful learning ability for high-/low-frequency image features, but also can theoretically guarantee that geometric texture details will be reconstructed from preliminary linear reconstruction. Furthermore, it can avoid the risk of intermediate reconstruction results falling outside the geometric decomposition domains and achieve fast convergence. Our reconstruction framework is decomposed into four modules including general linear reconstruction, cascade geometric incremental restoration, Nesterov acceleration, and post-processing. In the image restoration step, a cascade geometric incremental learning module is designed to compensate for missing texture information from different geometric spectral decomposition domains. Inspired by the overlap-tile strategy, we also develop a post-processing module to remove the block effect in patch-wise-based natural image reconstruction. All parameters in the proposed model are learnable, an adaptive initialization technique of physical parameters is also employed to make model flexibility and ensure converging smoothly. We compare the reconstruction performance of the proposed method with existing state-of-the-art methods to demonstrate its superiority. Our source codes are available at https://github.com/fanxiaohong/Nest-DGIL.
    COVID-19 Detection Using Swin Transformer Approach from Computed Tomography Images. (arXiv:2310.08165v1 [eess.IV])
    The accurate and efficient diagnosis of COVID-19 is of paramount importance, particularly in the context of large-scale medical imaging datasets. In this preprint paper, we propose a novel approach for COVID-19 diagnosis using CT images that leverages the power of Swin Transformer models, state-of-the-art solutions in computer vision tasks. Our method includes a systematic approach for patient-level predictions, where individual CT slices are classified as COVID-19 or non-COVID, and the patient's overall diagnosis is determined through majority voting. The application of the Swin Transformer in this context results in patient-level predictions that demonstrate exceptional diagnostic accuracy. In terms of evaluation metrics, our approach consistently outperforms the baseline, as well as numerous competing methods, showcasing its effectiveness in COVID-19 diagnosis. The macro F1 score achieved by our model exceeds the baseline and offers a robust solution for accurate diagnosis.
    Learn From Model Beyond Fine-Tuning: A Survey. (arXiv:2310.08184v1 [cs.AI])
    Foundation models (FM) have demonstrated remarkable performance across a wide range of tasks (especially in the fields of natural language processing and computer vision), primarily attributed to their ability to comprehend instructions and access extensive, high-quality data. This not only showcases their current effectiveness but also sets a promising trajectory towards the development of artificial general intelligence. Unfortunately, due to multiple constraints, the raw data of the model used for large model training are often inaccessible, so the use of end-to-end models for downstream tasks has become a new research trend, which we call Learn From Model (LFM) in this article. LFM focuses on the research, modification, and design of FM based on the model interface, so as to better understand the model structure and weights (in a black box environment), and to generalize the model to downstream tasks. The study of LFM techniques can be broadly categorized into five major areas: model tuning, model distillation, model reuse, meta learning and model editing. Each category encompasses a repertoire of methods and strategies that aim to enhance the capabilities and performance of FM. This paper gives a comprehensive review of the current methods based on FM from the perspective of LFM, in order to help readers better understand the current research status and ideas. To conclude, we summarize the survey by highlighting several critical areas for future exploration and addressing open issues that require further attention from the research community. The relevant papers we investigated in this article can be accessed at .
    Emergence of Latent Binary Encoding in Deep Neural Network Classifiers. (arXiv:2310.08224v1 [cs.LG])
    We observe the emergence of binary encoding within the latent space of deep-neural-network classifiers. Such binary encoding is induced by introducing a linear penultimate layer, which is equipped during training with a loss function that grows as $\exp(\vec{x}^2)$, where $\vec{x}$ are the coordinates in the latent space. The phenomenon we describe represents a specific instance of a well-documented occurrence known as \textit{neural collapse}, which arises in the terminal phase of training and entails the collapse of latent class means to the vertices of a simplex equiangular tight frame (ETF). We show that binary encoding accelerates convergence toward the simplex ETF and enhances classification accuracy.
    On Training Derivative-Constrained Neural Networks. (arXiv:2310.01649v2 [cs.LG] UPDATED)
    We refer to the setting where the (partial) derivatives of a neural network's (NN's) predictions with respect to its inputs are used as additional training signal as a derivative-constrained (DC) NN. This situation is common in physics-informed settings in the natural sciences. We propose an integrated RELU (IReLU) activation function to improve training of DC NNs. We also investigate denormalization and label rescaling to help stabilize DC training. We evaluate our methods on physics-informed settings including quantum chemistry and Scientific Machine Learning (SciML) tasks. We demonstrate that existing architectures with IReLU activations combined with denormalization and label rescaling better incorporate training signal provided by derivative constraints.
    Generalization bounds for neural ordinary differential equations and deep residual networks. (arXiv:2305.06648v2 [stat.ML] UPDATED)
    Neural ordinary differential equations (neural ODEs) are a popular family of continuous-depth deep learning models. In this work, we consider a large family of parameterized ODEs with continuous-in-time parameters, which include time-dependent neural ODEs. We derive a generalization bound for this class by a Lipschitz-based argument. By leveraging the analogy between neural ODEs and deep residual networks, our approach yields in particular a generalization bound for a class of deep residual networks. The bound involves the magnitude of the difference between successive weight matrices. We illustrate numerically how this quantity affects the generalization capability of neural networks.  ( 2 min )
    Asynchronous Evolution of Deep Neural Network Architectures. (arXiv:2308.04102v2 [cs.NE] UPDATED)
    Many evolutionary algorithms (EAs) take advantage of parallel evaluation of candidates. However, if evaluation times vary significantly, many worker nodes (i.e.,\ compute clients) are idle much of the time, waiting for the next generation to be created. Evolutionary neural architecture search (ENAS), a class of EAs that optimizes the architecture and hyperparameters of deep neural networks, is particularly vulnerable to this issue. This paper proposes a generic asynchronous evaluation strategy (AES) that is then adapted to work with ENAS. AES increases throughput by maintaining a queue of up to $K$ individuals ready to be sent to the workers for evaluation and proceeding to the next generation as soon as $M<<K$ individuals have been evaluated. A suitable value for $M$ is determined experimentally, balancing diversity and efficiency. To showcase the generality and power of AES, it was first evaluated in eight-line sorting network design (a single-population optimization task with limited evaluation-time variability), achieving an over two-fold speedup. Next, it was evaluated in 11-bit multiplexer design (a single-population discovery task with extended variability), where a 14-fold speedup was observed. It was then scaled up to ENAS for image captioning (a multi-population open-ended-optimization task), resulting in an over two-fold speedup. In all problems, a multifold performance improvement was observed, suggesting that AES is a promising method for parallelizing the evolution of complex systems with long and variable evaluation times, such as those in ENAS.
    Measuring Feature Sparsity in Language Models. (arXiv:2310.07837v1 [cs.LG])
    Recent works have proposed that activations in language models can be modelled as sparse linear combinations of vectors corresponding to features of input text. Under this assumption, these works aimed to reconstruct feature directions using sparse coding. We develop metrics to assess the success of these sparse coding techniques and test the validity of the linearity and sparsity assumptions. We show our metrics can predict the level of sparsity on synthetic sparse linear activations, and can distinguish between sparse linear data and several other distributions. We use our metrics to measure levels of sparsity in several language models. We find evidence that language model activations can be accurately modelled by sparse linear combinations of features, significantly more so than control datasets. We also show that model activations appear to be sparsest in the first and final layers.
    DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning. (arXiv:2309.05173v2 [cs.CL] UPDATED)
    Prompt tuning (PT), where a small amount of trainable soft (continuous) prompt vectors is affixed to the input of language models (LM), has shown promising results across various tasks and models for parameter-efficient fine-tuning (PEFT). PT stands out from other PEFT approaches because it maintains competitive performance with fewer trainable parameters and does not drastically scale up its parameters as the model size expands. However, PT introduces additional soft prompt tokens, leading to longer input sequences, which significantly impacts training and inference time and memory usage due to the Transformer's quadratic complexity. Particularly concerning for Large Language Models (LLMs) that face heavy daily querying. To address this issue, we propose Decomposed Prompt Tuning (DePT), which decomposes the soft prompt into a shorter soft prompt and a pair of low-rank matrices that are then optimised with two different learning rates. This allows DePT to achieve better performance while saving over 20% memory and time costs compared to vanilla PT and its variants, without changing trainable parameter sizes. Through extensive experiments on 23 natural language processing (NLP) and vision-language (VL) tasks, we demonstrate that DePT outperforms state-of-the-art PEFT approaches, including the full fine-tuning baseline in some scenarios. Additionally, we empirically show that DEPT grows more efficient as the model size increases. Our further study reveals that DePT integrates seamlessly with parameter-efficient transfer learning in the few-shot learning setting and highlights its adaptability to various model architectures and sizes.
    Memorization with neural nets: going beyond the worst case. (arXiv:2310.00327v2 [stat.ML] UPDATED)
    In practice, deep neural networks are often able to easily interpolate their training data. To understand this phenomenon, many works have aimed to quantify the memorization capacity of a neural network architecture: the largest number of points such that the architecture can interpolate any placement of these points with any assignment of labels. For real-world data, however, one intuitively expects the presence of a benign structure so that interpolation already occurs at a smaller network size than suggested by memorization capacity. In this paper, we investigate interpolation by adopting an instance-specific viewpoint. We introduce a simple randomized algorithm that, given a fixed finite dataset with two classes, with high probability constructs an interpolating three-layer neural network in polynomial time. The required number of parameters is linked to geometric properties of the two classes and their mutual arrangement. As a result, we obtain guarantees that are independent of the number of samples and hence move beyond worst-case memorization capacity bounds. We illustrate the effectiveness of the algorithm in non-pathological situations with extensive numerical experiments and link the insights back to the theoretical results.
    A Theoretical Explanation of Activation Sparsity through Flat Minima and Adversarial Robustness. (arXiv:2309.03004v2 [cs.LG] UPDATED)
    A recent empirical observation (Li et al., 2022b) of activation sparsity in MLP blocks offers an opportunity to drastically reduce computation costs for free. Although having attributed it to training dynamics, existing theoretical explanations of activation sparsity are restricted to shallow networks, small training steps and special training, despite its emergence in deep models standardly trained for a large number of steps. To fill these gaps, we propose the notion of gradient sparsity as one source of activation sparsity and a theoretical explanation based on it that sees sparsity a necessary step to adversarial robustness w.r.t. hidden features and parameters, which is approximately the flatness of minima for well-learned models. The theory applies to standardly trained LayerNorm-ed MLPs, and further to Transformers or other architectures trained with weight noises. Eliminating other sources of flatness except for sparsity, we discover the phenomenon that the ratio between the largest and smallest non-zero singular values of weight matrices is small. When discussing the emergence of this spectral concentration, we use random matrix theory (RMT) as a powerful tool to analyze stochastic gradient noises. Validational experiments are conducted to verify our gradient-sparsity-based explanation. We propose two plug-and-play modules for both training and finetuning for sparsity. Experiments on ImageNet-1k and C4 demonstrate their 50% sparsity improvements, indicating further potential cost reduction in both training and inference.
    Semantic-Forward Relaying: A Novel Framework Towards 6G Cooperative Communications. (arXiv:2310.07987v1 [cs.NI])
    This letter proposes a novel relaying framework, semantic-forward (SF), for cooperative communications towards the sixth-generation (6G) wireless networks. The SF relay extracts and transmits the semantic features, which reduces forwarding payload, and also improves the network robustness against intra-link errors. Based on the theoretical basis for cooperative communications with side information and the turbo principle, we design a joint source-channel coding algorithm to iteratively exchange the extrinsic information for enhancing the decoding gains at the destination. Surprisingly, simulation results indicate that even in bad channel conditions, SF relaying can still effectively improve the recovered information quality.
    On the Security Vulnerabilities of Text-to-SQL Models. (arXiv:2211.15363v3 [cs.CL] UPDATED)
    Although it has been demonstrated that Natural Language Processing (NLP) algorithms are vulnerable to deliberate attacks, the question of whether such weaknesses can lead to software security threats is under-explored. To bridge this gap, we conducted vulnerability tests on Text-to-SQL systems that are commonly used to create natural language interfaces to databases. We showed that the Text-to-SQL modules within six commercial applications can be manipulated to produce malicious code, potentially leading to data breaches and Denial of Service attacks. This is the first demonstration that NLP models can be exploited as attack vectors in the wild. In addition, experiments using four open-source language models verified that straightforward backdoor attacks on Text-to-SQL systems achieve a 100% success rate without affecting their performance. The aim of this work is to draw the community's attention to potential software security issues associated with NLP algorithms and encourage exploration of methods to mitigate against them.
    Learning to Generate Novel Scientific Directions with Contextualized Literature-based Discovery. (arXiv:2305.14259v3 [cs.CL] UPDATED)
    Literature-Based Discovery (LBD) aims to discover new scientific knowledge by mining papers and generating hypotheses. Standard LBD is limited to predicting pairwise relations between discrete concepts (e.g., drug-disease links), and ignores critical contexts like experimental settings (e.g., a specific patient population where a drug is evaluated) and background motivations (e.g., to find drugs without specific side effects). We address these limitations with a novel formulation of contextualized-LBD (C-LBD): generating scientific hypotheses in natural language, while grounding them in a context that controls the hypothesis search space. We present a modeling framework using retrieval of ``inspirations'' from past scientific papers. Our evaluations reveal that GPT-4 tends to generate ideas with overall low technical depth and novelty, while our inspiration prompting approaches partially mitigate this issue. Our work represents a first step toward building language models that generate new ideas derived from scientific literature.  ( 2 min )
    Reset It and Forget It: Relearning Last-Layer Weights Improves Continual and Transfer Learning. (arXiv:2310.07996v1 [cs.LG])
    This work identifies a simple pre-training mechanism that leads to representations exhibiting better continual and transfer learning. This mechanism -- the repeated resetting of weights in the last layer, which we nickname "zapping" -- was originally designed for a meta-continual-learning procedure, yet we show it is surprisingly applicable in many settings beyond both meta-learning and continual learning. In our experiments, we wish to transfer a pre-trained image classifier to a new set of classes, in a few shots. We show that our zapping procedure results in improved transfer accuracy and/or more rapid adaptation in both standard fine-tuning and continual learning settings, while being simple to implement and computationally efficient. In many cases, we achieve performance on par with state of the art meta-learning without needing the expensive higher-order gradients, by using a combination of zapping and sequential learning. An intuitive explanation for the effectiveness of this zapping procedure is that representations trained with repeated zapping learn features that are capable of rapidly adapting to newly initialized classifiers. Such an approach may be considered a computationally cheaper type of, or alternative to, meta-learning rapidly adaptable features with higher-order gradients. This adds to recent work on the usefulness of resetting neural network parameters during training, and invites further investigation of this mechanism.
    Finite Scalar Quantization: VQ-VAE Made Simple. (arXiv:2309.15505v2 [cs.CV] UPDATED)
    We propose to replace vector quantization (VQ) in the latent representation of VQ-VAEs with a simple scheme termed finite scalar quantization (FSQ), where we project the VAE representation down to a few dimensions (typically less than 10). Each dimension is quantized to a small set of fixed values, leading to an (implicit) codebook given by the product of these sets. By appropriately choosing the number of dimensions and values each dimension can take, we obtain the same codebook size as in VQ. On top of such discrete representations, we can train the same models that have been trained on VQ-VAE representations. For example, autoregressive and masked transformer models for image generation, multimodal generation, and dense prediction computer vision tasks. Concretely, we employ FSQ with MaskGIT for image generation, and with UViM for depth estimation, colorization, and panoptic segmentation. Despite the much simpler design of FSQ, we obtain competitive performance in all these tasks. We emphasize that FSQ does not suffer from codebook collapse and does not need the complex machinery employed in VQ (commitment losses, codebook reseeding, code splitting, entropy penalties, etc.) to learn expressive discrete representations.
    Neural Diffusion Models. (arXiv:2310.08337v1 [cs.LG])
    Diffusion models have shown remarkable performance on many generative tasks. Despite recent success, most diffusion models are restricted in that they only allow linear transformation of the data distribution. In contrast, broader family of transformations can potentially help train generative distributions more efficiently, simplifying the reverse process and closing the gap between the true negative log-likelihood and the variational approximation. In this paper, we present Neural Diffusion Models (NDMs), a generalization of conventional diffusion models that enables defining and learning time-dependent non-linear transformations of data. We show how to optimise NDMs using a variational bound in a simulation-free setting. Moreover, we derive a time-continuous formulation of NDMs, which allows fast and reliable inference using off-the-shelf numerical ODE and SDE solvers. Finally, we demonstrate the utility of NDMs with learnable transformations through experiments on standard image generation benchmarks, including CIFAR-10, downsampled versions of ImageNet and CelebA-HQ. NDMs outperform conventional diffusion models in terms of likelihood and produce high-quality samples.
    Pure Monte Carlo Counterfactual Regret Minimization. (arXiv:2309.03084v2 [cs.AI] UPDATED)
    Counterfactual Regret Minimization (CFR) and its variants are the best algorithms so far for solving large-scale incomplete information games. However, we believe that there are two problems with CFR: First, matrix multiplication is required in CFR iteration, and the time complexity of one iteration is too high; Secondly, the game characteristics in the real world are different. Just using one CFR algorithm will not be perfectly suitable for all game problems. For these two problems, this paper proposes a new algorithm called Pure CFR (PCFR) based on CFR. PCFR can be seen as a combination of CFR and Fictitious Play (FP), inheriting the concept of counterfactual regret (value) from CFR, and using the best response strategy instead of the regret matching strategy for the next iteration. This algorithm has three advantages. First, PCFR can be combined with any CFR variant. The resulting Pure MCCFR (PMCCFR) can significantly reduce the time and space complexity of one iteration. Secondly, our experiments show that the convergence speed of the PMCCFR is 2$\sim$3 times that of the MCCFR. Finally, there is a type of game that is very suitable for PCFR, we call this type of game clear-game, which is characterized by a high proportion of dominated strategies. Experiments show that in clear-game, the convergence rate of PMCCFR is two orders of magnitude higher than that of MCCFR.
    A Carbon Tracking Model for Federated Learning: Impact of Quantization and Sparsification. (arXiv:2310.08087v1 [eess.SP])
    Federated Learning (FL) methods adopt efficient communication technologies to distribute machine learning tasks across edge devices, reducing the overhead in terms of data storage and computational complexity compared to centralized solutions. Rather than moving large data volumes from producers (sensors, machines) to energy-hungry data centers, raising environmental concerns due to resource demands, FL provides an alternative solution to mitigate the energy demands of several learning tasks while enabling new Artificial Intelligence of Things (AIoT) applications. This paper proposes a framework for real-time monitoring of the energy and carbon footprint impacts of FL systems. The carbon tracking tool is evaluated for consensus (fully decentralized) and classical FL policies. For the first time, we present a quantitative evaluation of different computationally and communication efficient FL methods from the perspectives of energy consumption and carbon equivalent emissions, suggesting also general guidelines for energy-efficient design. Results indicate that consensus-driven FL implementations should be preferred for limiting carbon emissions when the energy efficiency of the communication is low (i.e., < 25 Kbit/Joule). Besides, quantization and sparsification operations are shown to strike a balance between learning performances and energy consumption, leading to sustainable FL designs.
    Deep Reinforcement Learning for Autonomous Cyber Operations: A Survey. (arXiv:2310.07745v1 [cs.LG])
    The rapid increase in the number of cyber-attacks in recent years raises the need for principled methods for defending networks against malicious actors. Deep reinforcement learning (DRL) has emerged as a promising approach for mitigating these attacks. However, while DRL has shown much potential for cyber-defence, numerous challenges must be overcome before DRL can be applied to autonomous cyber-operations (ACO) at scale. Principled methods are required for environments that confront learners with very high-dimensional state spaces, large multi-discrete action spaces, and adversarial learning. Recent works have reported success in solving these problems individually. There have also been impressive engineering efforts towards solving all three for real-time strategy games. However, applying DRL to the full ACO problem remains an open challenge. Here, we survey the relevant DRL literature and conceptualize an idealised ACO-DRL agent. We provide: i.) A summary of the domain properties that define the ACO problem; ii.) A comprehensive evaluation of the extent to which domains used for benchmarking DRL approaches are comparable to ACO; iii.) An overview of state-of-the-art approaches for scaling DRL to domains that confront learners with the curse of dimensionality, and; iv.) A survey and critique of current methods for limiting the exploitability of agents within adversarial settings from the perspective of ACO. We conclude with open research questions that we hope will motivate future directions for researchers and practitioners working on ACO.
    Towards the Fundamental Limits of Knowledge Transfer over Finite Domains. (arXiv:2310.07838v1 [cs.LG])
    We characterize the statistical efficiency of knowledge transfer through $n$ samples from a teacher to a probabilistic student classifier with input space $\mathcal S$ over labels $\mathcal A$. We show that privileged information at three progressive levels accelerates the transfer. At the first level, only samples with hard labels are known, via which the maximum likelihood estimator attains the minimax rate $\sqrt{{|{\mathcal S}||{\mathcal A}|}/{n}}$. The second level has the teacher probabilities of sampled labels available in addition, which turns out to boost the convergence rate lower bound to ${{|{\mathcal S}||{\mathcal A}|}/{n}}$. However, under this second data acquisition protocol, minimizing a naive adaptation of the cross-entropy loss results in an asymptotically biased student. We overcome this limitation and achieve the fundamental limit by using a novel empirical variant of the squared error logit loss. The third level further equips the student with the soft labels (complete logits) on ${\mathcal A}$ given every sampled input, thereby provably enables the student to enjoy a rate ${|{\mathcal S}|}/{n}$ free of $|{\mathcal A}|$. We find any Kullback-Leibler divergence minimizer to be optimal in the last case. Numerical simulations distinguish the four learners and corroborate our theory.
    Discovering Hierarchical Achievements in Reinforcement Learning via Contrastive Learning. (arXiv:2307.03486v2 [cs.LG] UPDATED)
    Discovering achievements with a hierarchical structure in procedurally generated environments presents a significant challenge. This requires an agent to possess a broad range of abilities, including generalization and long-term reasoning. Many prior methods have been built upon model-based or hierarchical approaches, with the belief that an explicit module for long-term planning would be advantageous for learning hierarchical dependencies. However, these methods demand an excessive number of environment interactions or large model sizes, limiting their practicality. In this work, we demonstrate that proximal policy optimization (PPO), a simple yet versatile model-free algorithm, outperforms previous methods when optimized with recent implementation practices. Moreover, we find that the PPO agent can predict the next achievement to be unlocked to some extent, albeit with limited confidence. Based on this observation, we introduce a novel contrastive learning method, called achievement distillation, which strengthens the agent's ability to predict the next achievement. Our method exhibits a strong capacity for discovering hierarchical achievements and shows state-of-the-art performance on the challenging Crafter environment in a sample-efficient manner while utilizing fewer model parameters.
    Conformal inference for regression on Riemannian Manifolds. (arXiv:2310.08209v1 [stat.ML])
    Regression on manifolds, and, more broadly, statistics on manifolds, has garnered significant importance in recent years due to the vast number of applications for this type of data. Circular data is a classic example, but so is data in the space of covariance matrices, data on the Grassmannian manifold obtained as a result of principal component analysis, among many others. In this work we investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by X, lies in Euclidean space. This extends the concepts delineated in [Lei and Wasserman, 2014] to this novel context. Aligning with traditional principles in conformal inference, these prediction sets are distribution-free, indicating that no specific assumptions are imposed on the joint distribution of $(X, Y)$, and they maintain a non-parametric character. We prove the asymptotic almost sure convergence of the empirical version of these regions on the manifold to their population counterparts. The efficiency of this method is shown through a comprehensive simulation study and an analysis involving real-world data.
    Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets. (arXiv:2310.04413v2 [cs.LG] UPDATED)
    Offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data. The primary motivation for using reinforcement learning (RL) instead of supervised learning techniques such as behavior cloning is to find a policy that achieves a higher average return than the trajectories constituting the dataset. However, we empirically find that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset. We argue this is due to an assumption made by current offline RL algorithms of staying close to the trajectories in the dataset. If the dataset primarily consists of sub-optimal trajectories, this assumption forces the policy to mimic the suboptimal actions. We overcome this issue by proposing a sampling strategy that enables the policy to only be constrained to ``good data" rather than all actions in the dataset (i.e., uniform sampling). We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms. Our evaluation demonstrates significant performance gains in 72 imbalanced datasets, D4RL dataset, and across three different offline RL algorithms. Code is available at https://github.com/Improbable-AI/dw-offline-rl.
    Provably Efficient Offline Goal-Conditioned Reinforcement Learning with General Function Approximation and Single-Policy Concentrability. (arXiv:2302.03770v2 [cs.LG] UPDATED)
    Goal-conditioned reinforcement learning (GCRL) refers to learning general-purpose skills that aim to reach diverse goals. In particular, offline GCRL only requires purely pre-collected datasets to perform training tasks without additional interactions with the environment. Although offline GCRL has become increasingly prevalent and many previous works have demonstrated its empirical success, the theoretical understanding of efficient offline GCRL algorithms is not well established, especially when the state space is huge and the offline dataset only covers the policy we aim to learn. In this paper, we provide a rigorous theoretical analysis of an existing empirically successful offline GCRL algorithm. We prove that under slight modification, this algorithm enjoys an $\widetilde{O}(\text{poly}(1/\epsilon))$ sample complexity (where $\epsilon$ is the desired suboptimality of the learned policy) with general function approximation thanks to the property of (semi-)strong convexity of the objective functions. We only require nearly minimal assumptions on the dataset (single-policy concentrability) and the function class (realizability). Moreover, this algorithm consists of two uninterleaved optimization steps, which we refer to as $V$-learning and policy learning, and is computationally stable since it does not involve minimax optimization. We also empirically validate our theory by showing that the modified algorithm outperforms the previous algorithm in various real-world environments. To the best of our knowledge, this is the first algorithm that is both provably efficient with general function approximation and single-policy concentrability, and empirically successful without requiring solving minimax optimization problems.  ( 3 min )
    Analyzing And Editing Inner Mechanisms Of Backdoored Language Models. (arXiv:2302.12461v2 [cs.LG] UPDATED)
    Poisoning of data sets is a potential security threat to large language models that can lead to backdoored models. A description of the internal mechanisms of backdoored language models and how they process trigger inputs, e.g., when switching to toxic language, has yet to be found. In this work, we study the internal representations of transformer-based backdoored language models and determine early-layer MLP modules as most important for the backdoor mechanism in combination with the initial embedding projection. We use this knowledge to remove, insert, and modify backdoor mechanisms with engineered replacements that reduce the MLP module outputs to essentials for the backdoor mechanism. To this end, we introduce PCP ablation, where we replace transformer modules with low-rank matrices based on the principal components of their activations. We demonstrate our results on backdoored toy, backdoored large, and non-backdoored open-source models. We show that we can improve the backdoor robustness of large language models by locally constraining individual modules during fine-tuning on potentially poisonous data sets. Trigger warning: Offensive language.
    Quantum-Enhanced Forecasting: Leveraging Quantum Gramian Angular Field and CNNs for Stock Return Predictions. (arXiv:2310.07427v2 [cs.LG] UPDATED)
    We propose a time series forecasting method named Quantum Gramian Angular Field (QGAF). This approach merges the advantages of quantum computing technology with deep learning, aiming to enhance the precision of time series classification and forecasting. We successfully transformed stock return time series data into two-dimensional images suitable for Convolutional Neural Network (CNN) training by designing specific quantum circuits. Distinct from the classical Gramian Angular Field (GAF) approach, QGAF's uniqueness lies in eliminating the need for data normalization and inverse cosine calculations, simplifying the transformation process from time series data to two-dimensional images. To validate the effectiveness of this method, we conducted experiments on datasets from three major stock markets: the China A-share market, the Hong Kong stock market, and the US stock market. Experimental results revealed that compared to the classical GAF method, the QGAF approach significantly improved time series prediction accuracy, reducing prediction errors by an average of 25% for Mean Absolute Error (MAE) and 48% for Mean Squared Error (MSE). This research confirms the potential and promising prospects of integrating quantum computing with deep learning techniques in financial time series forecasting.
    Contextualized Policy Recovery: Modeling and Interpreting Medical Decisions with Adaptive Imitation Learning. (arXiv:2310.07918v1 [cs.LG])
    Interpretable policy learning seeks to estimate intelligible decision policies from observed actions; however, existing models fall short by forcing a tradeoff between accuracy and interpretability. This tradeoff limits data-driven interpretations of human decision-making process. e.g. to audit medical decisions for biases and suboptimal practices, we require models of decision processes which provide concise descriptions of complex behaviors. Fundamentally, existing approaches are burdened by this tradeoff because they represent the underlying decision process as a universal policy, when in fact human decisions are dynamic and can change drastically with contextual information. Thus, we propose Contextualized Policy Recovery (CPR), which re-frames the problem of modeling complex decision processes as a multi-task learning problem in which complex decision policies are comprised of context-specific policies. CPR models each context-specific policy as a linear observation-to-action mapping, and generates new decision models $\textit{on-demand}$ as contexts are updated with new observations. CPR is compatible with fully offline and partially observable decision environments, and can be tailored to incorporate any recurrent black-box model or interpretable decision model. We assess CPR through studies on simulated and real data, achieving state-of-the-art performance on the canonical tasks of predicting antibiotic prescription in intensive care units ($+22\%$ AUROC vs. previous SOTA) and predicting MRI prescription for Alzheimer's patients ($+7.7\%$ AUROC vs. previous SOTA). With this improvement in predictive performance, CPR closes the accuracy gap between interpretable and black-box methods for policy learning, allowing high-resolution exploration and analysis of context-specific decision models.
    NuTime: Numerically Multi-Scaled Embedding for Large-Scale Time Series Pretraining. (arXiv:2310.07402v2 [cs.LG] UPDATED)
    Recent research on time-series self-supervised models shows great promise in learning semantic representations. However, it has been limited to small-scale datasets, e.g., thousands of temporal sequences. In this work, we make key technical contributions that are tailored to the numerical properties of time-series data and allow the model to scale to large datasets, e.g., millions of temporal sequences. We adopt the Transformer architecture by first partitioning the input into non-overlapping windows. Each window is then characterized by its normalized shape and two scalar values denoting the mean and standard deviation within each window. To embed scalar values that may possess arbitrary numerical scales to high-dimensional vectors, we propose a numerically multi-scaled embedding module enumerating all possible scales for the scalar values. The model undergoes pretraining using the proposed numerically multi-scaled embedding with a simple contrastive objective on a large-scale dataset containing over a million sequences. We study its transfer performance on a number of univariate and multivariate classification benchmarks. Our method exhibits remarkable improvement against previous representation learning approaches and establishes the new state of the art, even compared with domain-specific non-learning-based methods.
    MMTSA: Multimodal Temporal Segment Attention Network for Efficient Human Activity Recognition. (arXiv:2210.09222v2 [cs.CV] UPDATED)
    Multimodal sensors provide complementary information to develop accurate machine-learning methods for human activity recognition (HAR), but introduce significantly higher computational load, which reduces efficiency. This paper proposes an efficient multimodal neural architecture for HAR using an RGB camera and inertial measurement units (IMUs) called Multimodal Temporal Segment Attention Network (MMTSA). MMTSA first transforms IMU sensor data into a temporal and structure-preserving gray-scale image using the Gramian Angular Field (GAF), representing the inherent properties of human activities. MMTSA then applies a multimodal sparse sampling method to reduce data redundancy. Lastly, MMTSA adopts an inter-segment attention module for efficient multimodal fusion. Using three well-established public datasets, we evaluated MMTSA's effectiveness and efficiency in HAR. Results show that our method achieves superior performance improvements 11.13% of cross-subject F1-score on the MMAct dataset than the previous state-of-the-art (SOTA) methods. The ablation study and analysis suggest that MMTSA's effectiveness in fusing multimodal data for accurate HAR. The efficiency evaluation on an edge device showed that MMTSA achieved significantly better accuracy, lower computational load, and lower inference latency than SOTA methods.  ( 2 min )
    A Comprehensive Review on Tree Detection Methods Using Point Cloud and Aerial Imagery from Unmanned Aerial Vehicles. (arXiv:2309.16375v2 [cs.CV] CROSS LISTED)
    Unmanned Aerial Vehicles (UAVs) are considered cutting-edge technology with highly cost-effective and flexible usage scenarios. Although many papers have reviewed the application of UAVs in agriculture, the review of the application for tree detection is still insufficient. This paper focuses on tree detection methods applied to UAV data collected by UAVs. There are two kinds of data, the point cloud and the images, which are acquired by the Light Detection and Ranging (LiDAR) sensor and camera, respectively. Among the detection methods using point-cloud data, this paper mainly classifies these methods according to LiDAR and Digital Aerial Photography (DAP). For the detection methods using images directly, this paper reviews these methods by whether or not to use the Deep Learning (DL) method. Our review concludes and analyses the comparison and combination between the application of LiDAR-based and DAP-based point cloud data. The performance, relative merits, and application fields of the methods are also introduced. Meanwhile, this review counts the number of tree detection studies using different methods in recent years. From our statics, the detection task using DL methods on the image has become a mainstream trend as the number of DL-based detection researches increases to 45% of the total number of tree detection studies up to 2022. As a result, this review could help and guide researchers who want to carry out tree detection on specific forests and for farmers to use UAVs in managing agriculture production.
    Core-sets for Fair and Diverse Data Summarization. (arXiv:2310.08122v1 [cs.DS])
    We study core-set construction algorithms for the task of Diversity Maximization under fairness/partition constraint. Given a set of points $P$ in a metric space partitioned into $m$ groups, and given $k_1,\ldots,k_m$, the goal of this problem is to pick $k_i$ points from each group $i$ such that the overall diversity of the $k=\sum_i k_i$ picked points is maximized. We consider two natural diversity measures: sum-of-pairwise distances and sum-of-nearest-neighbor distances, and show improved core-set construction algorithms with respect to these measures. More precisely, we show the first constant factor core-set w.r.t. sum-of-pairwise distances whose size is independent of the size of the dataset and the aspect ratio. Second, we show the first core-set w.r.t. the sum-of-nearest-neighbor distances. Finally, we run several experiments showing the effectiveness of our core-set approach. In particular, we apply constrained diversity maximization to summarize a set of timed messages that takes into account the messages' recency. Specifically, the summary should include more recent messages compared to older ones. This is a real task in one of the largest communication platforms, affecting the experience of hundreds of millions daily active users. By utilizing our core-set method for this task, we achieve a 100x speed-up while losing the diversity by only a few percent. Moreover, our approach allows us to improve the space usage of the algorithm in the streaming setting.
    MetaBox: A Benchmark Platform for Meta-Black-Box Optimization with Reinforcement Learning. (arXiv:2310.08252v1 [cs.LG])
    Recently, Meta-Black-Box Optimization with Reinforcement Learning (MetaBBO-RL) has showcased the power of leveraging RL at the meta-level to mitigate manual fine-tuning of low-level black-box optimizers. However, this field is hindered by the lack of a unified benchmark. To fill this gap, we introduce MetaBox, the first benchmark platform expressly tailored for developing and evaluating MetaBBO-RL methods. MetaBox offers a flexible algorithmic template that allows users to effortlessly implement their unique designs within the platform. Moreover, it provides a broad spectrum of over 300 problem instances, collected from synthetic to realistic scenarios, and an extensive library of 19 baseline methods, including both traditional black-box optimizers and recent MetaBBO-RL methods. Besides, MetaBox introduces three standardized performance metrics, enabling a more thorough assessment of the methods. In a bid to illustrate the utility of MetaBox for facilitating rigorous evaluation and in-depth analysis, we carry out a wide-ranging benchmarking study on existing MetaBBO-RL methods. Our MetaBox is open-source and accessible at: https://github.com/GMC-DRL/MetaBox.
    Generative modeling of time-dependent densities via optimal transport and projection pursuit. (arXiv:2304.09663v2 [stat.ML] UPDATED)
    Motivated by the computational difficulties incurred by popular deep learning algorithms for the generative modeling of temporal densities, we propose a cheap alternative which requires minimal hyperparameter tuning and scales favorably to high dimensional problems. In particular, we use a projection-based optimal transport solver [Meng et al., 2019] to join successive samples and subsequently use transport splines [Chewi et al., 2020] to interpolate the evolving density. When the sampling frequency is sufficiently high, the optimal maps are close to the identity and are thus computationally efficient to compute. Moreover, the training process is highly parallelizable as all optimal maps are independent and can thus be learned simultaneously. Finally, the approach is based solely on numerical linear algebra rather than minimizing a nonconvex objective function, allowing us to easily analyze and control the algorithm. We present several numerical experiments on both synthetic and real-world datasets to demonstrate the efficiency of our method. In particular, these experiments show that the proposed approach is highly competitive compared with state-of-the-art normalizing flows conditioned on time across a wide range of dimensionalities.
    An interpretable neural network-based non-proportional odds model for ordinal regression. (arXiv:2303.17823v3 [stat.ME] UPDATED)
    This study proposes an interpretable neural network-based non-proportional odds model (N$^3$POM) for ordinal regression. N$^3$POM is different from conventional approaches to ordinal regression with non-proportional models in several ways: (1) N$^3$POM is designed to directly handle continuous responses, whereas standard methods typically treat de facto ordered continuous variables as discrete, (2) instead of estimating response-dependent finite coefficients of linear models from discrete responses as is done in conventional approaches, we train a non-linear neural network to serve as a coefficient function. Thanks to the neural network, N$^3$POM offers flexibility while preserving the interpretability of conventional ordinal regression. We establish a sufficient condition under which the predicted conditional cumulative probability locally satisfies the monotonicity constraint over a user-specified region in the covariate space. Additionally, we provide a monotonicity-preserving stochastic (MPS) algorithm for effectively training the neural network. We apply N$^3$POM to several real-world datasets.
    Bitrate-Constrained DRO: Beyond Worst Case Robustness To Unknown Group Shifts. (arXiv:2302.02931v2 [cs.LG] UPDATED)
    Training machine learning models robust to distribution shifts is critical for real-world applications. Some robust training algorithms (e.g., Group DRO) specialize to group shifts and require group information on all training points. Other methods (e.g., CVaR DRO) that do not need group annotations can be overly conservative, since they naively upweight high loss points which may form a contrived set that does not correspond to any meaningful group in the real world (e.g., when the high loss points are randomly mislabeled training points). In this work, we address limitations in prior approaches by assuming a more nuanced form of group shift: conditioned on the label, we assume that the true group function (indicator over group) is simple. For example, we may expect that group shifts occur along low bitrate features (e.g., image background, lighting). Thus, we aim to learn a model that maintains high accuracy on simple group functions realized by these low bitrate features, that need not spend valuable model capacity achieving high accuracy on contrived groups of examples. Based on this, we consider the two-player game formulation of DRO where the adversary's capacity is bitrate-constrained. Our resulting practical algorithm, Bitrate-Constrained DRO (BR-DRO), does not require group information on training samples yet matches the performance of Group DRO on datasets that have training group annotations and that of CVaR DRO on long-tailed distributions. Our theoretical analysis reveals that in some settings BR-DRO objective can provably yield statistically efficient and less conservative solutions than unconstrained CVaR DRO.
    Explainable Attention for Few-shot Learning and Beyond. (arXiv:2310.07800v1 [cs.AI])
    Attention mechanisms have exhibited promising potential in enhancing learning models by identifying salient portions of input data. This is particularly valuable in scenarios where limited training samples are accessible due to challenges in data collection and labeling. Drawing inspiration from human recognition processes, we posit that an AI baseline's performance could be more accurate and dependable if it is exposed to essential segments of raw data rather than the entire input dataset, akin to human perception. However, the task of selecting these informative data segments, referred to as hard attention finding, presents a formidable challenge. In situations with few training samples, existing studies struggle to locate such informative regions due to the large number of training parameters that cannot be effectively learned from the available limited samples. In this study, we introduce a novel and practical framework for achieving explainable hard attention finding, specifically tailored for few-shot learning scenarios, called FewXAT. Our approach employs deep reinforcement learning to implement the concept of hard attention, directly impacting raw input data and thus rendering the process interpretable for human understanding. Through extensive experimentation across various benchmark datasets, we demonstrate the efficacy of our proposed method.
    Reinforcement Learning of Display Transfer Robots in Glass Flow Control Systems: A Physical Simulation-Based Approach. (arXiv:2310.07981v1 [cs.LG])
    A flow control system is a critical concept for increasing the production capacity of manufacturing systems. To solve the scheduling optimization problem related to the flow control with the aim of improving productivity, existing methods depend on a heuristic design by domain human experts. Therefore, the methods require correction, monitoring, and verification by using real equipment. As system designs increase in complexity, the monitoring time increases, which decreases the probability of arriving at the optimal design. As an alternative approach to the heuristic design of flow control systems, the use of deep reinforcement learning to solve the scheduling optimization problem has been considered. Although the existing research on reinforcement learning has yielded excellent performance in some areas, the applicability of the results to actual FAB such as display and semiconductor manufacturing processes is not evident so far. To this end, we propose a method to implement a physical simulation environment and devise a feasible flow control system design using a transfer robot in display manufacturing through reinforcement learning. We present a model and parameter setting to build a virtual environment for different display transfer robots, and training methods of reinforcement learning on the environment to obtain an optimal scheduling of glass flow control systems. Its feasibility was verified by using different types of robots used in the actual process.
    Identifying latent distances with Finslerian geometry. (arXiv:2212.10010v2 [cs.LG] UPDATED)
    Riemannian geometry provides us with powerful tools to explore the latent space of generative models while preserving the underlying structure of the data. The latent space can be equipped it with a Riemannian metric, pulled back from the data manifold. With this metric, we can systematically navigate the space relying on geodesics defined as the shortest curves between two points. Generative models are often stochastic, causing the data space, the Riemannian metric, and the geodesics, to be stochastic as well. Stochastic objects are at best impractical, and at worst impossible, to manipulate. A common solution is to approximate the stochastic pullback metric by its expectation. But the geodesics derived from this expected Riemannian metric do not correspond to the expected length-minimising curves. In this work, we propose another metric whose geodesics explicitly minimise the expected length of the pullback metric. We show this metric defines a Finsler metric, and we compare it with the expected Riemannian metric. In high dimensions, we prove that both metrics converge to each other at a rate of $O\left(\frac{1}{D}\right)$. This convergence implies that the established expected Riemannian metric is an accurate approximation of the theoretically more grounded Finsler metric. This provides justification for using the expected Riemannian metric for practical implementations.
    Theoretical Hardness and Tractability of POMDPs in RL with Partial Online State Information. (arXiv:2306.08762v2 [cs.LG] UPDATED)
    Partially observable Markov decision processes (POMDPs) have been widely applied to capture many real-world applications. However, existing theoretical results have shown that learning in general POMDPs could be intractable, where the main challenge lies in the lack of latent state information. A key fundamental question here is how much online state information (OSI) is sufficient to achieve tractability. In this paper, we establish a lower bound that reveals a surprising hardness result: unless we have full OSI, we need an exponentially scaling sample complexity to obtain an $\epsilon$-optimal policy solution for POMDPs. Nonetheless, inspired by the key insights in our lower bound design, we find that there exist important tractable classes of POMDPs even with only partial OSI. In particular, for two novel classes of POMDPs with partial OSI, we provide new algorithms that are proved to be near-optimal by establishing new regret upper and lower bounds.
    Infinite Width Graph Neural Networks for Node Regression/ Classification. (arXiv:2310.08176v1 [cs.LG])
    This work analyzes Graph Neural Networks, a generalization of Fully-Connected Deep Neural Nets on Graph structured data, when their width, that is the number of nodes in each fullyconnected layer is increasing to infinity. Infinite Width Neural Networks are connecting Deep Learning to Gaussian Processes and Kernels, both Machine Learning Frameworks with long traditions and extensive theoretical foundations. Gaussian Processes and Kernels have much less hyperparameters then Neural Networks and can be used for uncertainty estimation, making them more user friendly for applications. This works extends the increasing amount of research connecting Gaussian Processes and Kernels to Neural Networks. The Kernel and Gaussian Process closed forms are derived for a variety of architectures, namely the standard Graph Neural Network, the Graph Neural Network with Skip-Concatenate Connections and the Graph Attention Neural Network. All architectures are evaluated on a variety of datasets on the task of transductive Node Regression and Classification. Additionally, a Spectral Sparsification method known as Effective Resistance is used to improve runtime and memory requirements. Extending the setting to inductive graph learning tasks (Graph Regression/ Classification) is straightforward and is briefly discussed in 3.5.
    Tight Time-Space Lower Bounds for Constant-Pass Learning. (arXiv:2310.08070v1 [cs.LG])
    In his breakthrough paper, Raz showed that any parity learning algorithm requires either quadratic memory or an exponential number of samples [FOCS'16, JACM'19]. A line of work that followed extended this result to a large class of learning problems. Until recently, all these results considered learning in the streaming model, where each sample is drawn independently, and the learner is allowed a single pass over the stream of samples. Garg, Raz, and Tal [CCC'19] considered a stronger model, allowing multiple passes over the stream. In the $2$-pass model, they showed that learning parities of size $n$ requires either a memory of size $n^{1.5}$ or at least $2^{\sqrt{n}}$ samples. (Their result also generalizes to other learning problems.) In this work, for any constant $q$, we prove tight memory-sample lower bounds for any parity learning algorithm that makes $q$ passes over the stream of samples. We show that such a learner requires either $\Omega(n^{2})$ memory size or at least $2^{\Omega(n)}$ samples. Beyond establishing a tight lower bound, this is the first non-trivial lower bound for $q$-pass learning for any $q\ge 3$. Similar to prior work, our results extend to any learning problem with many nearly-orthogonal concepts. We complement the lower bound with an upper bound, showing that parity learning with $q$ passes can be done efficiently with $O(n^2/\log q)$ memory.
    AutoFHE: Automated Adaption of CNNs for Efficient Evaluation over FHE. (arXiv:2310.08012v1 [cs.LG])
    Secure inference of deep convolutional neural networks (CNNs) under RNS-CKKS involves polynomial approximation of unsupported non-linear activation functions. However, existing approaches have three main limitations: 1) Inflexibility: The polynomial approximation and associated homomorphic evaluation architecture are customized manually for each CNN architecture and do not generalize to other networks. 2) Suboptimal Approximation: Each activation function is approximated instead of the function represented by the CNN. 3) Restricted Design: Either high-degree or low-degree polynomial approximations are used. The former retains high accuracy but slows down inference due to bootstrapping operations, while the latter accelerates ciphertext inference but compromises accuracy. To address these limitations, we present AutoFHE, which automatically adapts standard CNNs for secure inference under RNS-CKKS. The key idea is to adopt layerwise mixed-degree polynomial activation functions, which are optimized jointly with the homomorphic evaluation architecture in terms of the placement of bootstrapping operations. The problem is modeled within a multi-objective optimization framework to maximize accuracy and minimize the number of bootstrapping operations. AutoFHE can be applied flexibly on any CNN architecture, and it provides diverse solutions that span the trade-off between accuracy and latency. Experimental evaluation over RNS-CKKS encrypted CIFAR datasets shows that AutoFHE accelerates secure inference by $1.32\times$ to $1.8\times$ compared to methods employing high-degree polynomials. It also improves accuracy by up to 2.56% compared to methods using low-degree polynomials. Lastly, AutoFHE accelerates inference and improves accuracy by $103\times$ and 3.46%, respectively, compared to CNNs under TFHE.
    Impact of multi-armed bandit strategies on deep recurrent reinforcement learning. (arXiv:2310.08331v1 [stat.ML])
    Incomplete knowledge of the environment leads an agent to make decisions under uncertainty. One of the major dilemmas in Reinforcement Learning (RL) where an autonomous agent has to balance two contrasting needs in making its decisions is: exploiting the current knowledge of the environment to maximize the cumulative reward as well as exploring actions that allow improving the knowledge of the environment, hopefully leading to higher reward values (exploration-exploitation trade-off). Concurrently, another relevant issue regards the full observability of the states, which may not be assumed in all applications. Such as when only 2D images are considered as input in a RL approach used for finding the optimal action within a 3D simulation environment. In this work, we address these issues by deploying and testing several techniques to balance exploration and exploitation trade-off on partially observable systems for predicting steering wheels in autonomous driving scenario. More precisely, the final aim is to investigate the effects of using both stochastic and deterministic multi-armed bandit strategies coupled with a Deep Recurrent Q-Network. Additionally, we adapted and evaluated the impact of an innovative method to improve the learning phase of the underlying Convolutional Recurrent Neural Network. We aim to show that adaptive stochastic methods for exploration better approximate the trade-off between exploration and exploitation as, in general, Softmax and Max-Boltzmann strategies are able to outperform epsilon-greedy techniques.
    Generative Intrinsic Optimization: Intrisic Control with Model Learning. (arXiv:2310.08100v1 [cs.LG])
    Future sequence represents the outcome after executing the action into the environment. When driven by the information-theoretic concept of mutual information, it seeks maximally informative consequences. Explicit outcomes may vary across state, return, or trajectory serving different purposes such as credit assignment or imitation learning. However, the inherent nature of incorporating intrinsic motivation with reward maximization is often neglected. In this work, we propose a variational approach to jointly learn the necessary quantity for estimating the mutual information and the dynamics model, providing a general framework for incorporating different forms of outcomes of interest. Integrated into a policy iteration scheme, our approach guarantees convergence to the optimal policy. While we mainly focus on theoretical analysis, our approach opens the possibilities of leveraging intrinsic control with model learning to enhance sample efficiency and incorporate uncertainty of the environment into decision-making.
    Efficient Integrators for Diffusion Generative Models. (arXiv:2310.07894v1 [cs.LG])
    Diffusion models suffer from slow sample generation at inference time. Therefore, developing a principled framework for fast deterministic/stochastic sampling for a broader class of diffusion models is a promising direction. We propose two complementary frameworks for accelerating sample generation in pre-trained models: Conjugate Integrators and Splitting Integrators. Conjugate integrators generalize DDIM, mapping the reverse diffusion dynamics to a more amenable space for sampling. In contrast, splitting-based integrators, commonly used in molecular dynamics, reduce the numerical simulation error by cleverly alternating between numerical updates involving the data and auxiliary variables. After extensively studying these methods empirically and theoretically, we present a hybrid method that leads to the best-reported performance for diffusion models in augmented spaces. Applied to Phase Space Langevin Diffusion [Pandey & Mandt, 2023] on CIFAR-10, our deterministic and stochastic samplers achieve FID scores of 2.11 and 2.36 in only 100 network function evaluations (NFE) as compared to 2.57 and 2.63 for the best-performing baselines, respectively. Our code and model checkpoints will be made publicly available at \url{https://github.com/mandt-lab/PSLD}.
    The Thousand Faces of Explainable AI Along the Machine Learning Life Cycle: Industrial Reality and Current State of Research. (arXiv:2310.07882v1 [cs.LG])
    In this paper, we investigate the practical relevance of explainable artificial intelligence (XAI) with a special focus on the producing industries and relate them to the current state of academic XAI research. Our findings are based on an extensive series of interviews regarding the role and applicability of XAI along the Machine Learning (ML) lifecycle in current industrial practice and its expected relevance in the future. The interviews were conducted among a great variety of roles and key stakeholders from different industry sectors. On top of that, we outline the state of XAI research by providing a concise review of the relevant literature. This enables us to provide an encompassing overview covering the opinions of the surveyed persons as well as the current state of academic research. By comparing our interview results with the current research approaches we reveal several discrepancies. While a multitude of different XAI approaches exists, most of them are centered around the model evaluation phase and data scientists. Their versatile capabilities for other stages are currently either not sufficiently explored or not popular among practitioners. In line with existing work, our findings also confirm that more efforts are needed to enable also non-expert users' interpretation and understanding of opaque AI models with existing methods and frameworks.
    Seeing-Eye Quadruped Navigation with Force Responsive Locomotion Control. (arXiv:2309.04370v2 [cs.RO] UPDATED)
    Seeing-eye robots are very useful tools for guiding visually impaired people, potentially producing a huge societal impact given the low availability and high cost of real guide dogs. Although a few seeing-eye robot systems have already been demonstrated, none considered external tugs from humans, which frequently occur in a real guide dog setting. In this paper, we simultaneously train a locomotion controller that is robust to external tugging forces via Reinforcement Learning (RL), and an external force estimator via supervised learning. The controller ensures stable walking, and the force estimator enables the robot to respond to the external forces from the human. These forces are used to guide the robot to the global goal, which is unknown to the robot, while the robot guides the human around nearby obstacles via a local planner. Experimental results in simulation and on hardware show that our controller is robust to external forces, and our seeing-eye system can accurately detect force direction. We demonstrate our full seeing-eye robot system on a real quadruped robot with a blindfolded human. The video can be seen at our project page: https://bu-air-lab.github.io/guide_dog/
    LLM4TS: Two-Stage Fine-Tuning for Time-Series Forecasting with Pre-Trained LLMs. (arXiv:2308.08469v3 [cs.LG] UPDATED)
    In this work, we leverage pre-trained Large Language Models (LLMs) to enhance time-series forecasting. Mirroring the growing interest in unifying models for Natural Language Processing and Computer Vision, we envision creating an analogous model for long-term time-series forecasting. Due to limited large-scale time-series data for building robust foundation models, our approach LLM4TS focuses on leveraging the strengths of pre-trained LLMs. By combining time-series patching with temporal encoding, we have enhanced the capability of LLMs to handle time-series data effectively. Inspired by the supervised fine-tuning in chatbot domains, we prioritize a two-stage fine-tuning process: first conducting supervised fine-tuning to orient the LLM towards time-series data, followed by task-specific downstream fine-tuning. Furthermore, to unlock the flexibility of pre-trained LLMs without extensive parameter adjustments, we adopt several Parameter-Efficient Fine-Tuning (PEFT) techniques. Drawing on these innovations, LLM4TS has yielded state-of-the-art results in long-term forecasting. Our model has also shown exceptional capabilities as both a robust representation learner and an effective few-shot learner, thanks to the knowledge transferred from the pre-trained LLM.
    Multi-Objective Optimization for Sparse Deep Neural Network Training. (arXiv:2308.12243v2 [cs.LG] UPDATED)
    Different conflicting optimization criteria arise naturally in various Deep Learning scenarios. These can address different main tasks (i.e., in the setting of Multi-Task Learning), but also main and secondary tasks such as loss minimization versus sparsity. The usual approach is a simple weighting of the criteria, which formally only works in the convex setting. In this paper, we present a Multi-Objective Optimization algorithm using a modified Weighted Chebyshev scalarization for training Deep Neural Networks (DNNs) with respect to several tasks. By employing this scalarization technique, the algorithm can identify all optimal solutions of the original problem while reducing its complexity to a sequence of single-objective problems. The simplified problems are then solved using an Augmented Lagrangian method, enabling the use of popular optimization techniques such as Adam and Stochastic Gradient Descent, while efficaciously handling constraints. Our work aims to address the (economical and also ecological) sustainability issue of DNN models, with a particular focus on Deep Multi-Task models, which are typically designed with a very large number of weights to perform equally well on multiple tasks. Through experiments conducted on two Machine Learning datasets, we demonstrate the possibility of adaptively sparsifying the model during training without significantly impacting its performance, if we are willing to apply task-specific adaptations to the network weights. Code is available at https://github.com/salomonhotegni/MDMTN.
    Continual Learning via Manifold Expansion Replay. (arXiv:2310.08038v1 [cs.LG])
    In continual learning, the learner learns multiple tasks in sequence, with data being acquired only once for each task. Catastrophic forgetting is a major challenge to continual learning. To reduce forgetting, some existing rehearsal-based methods use episodic memory to replay samples of previous tasks. However, in the process of knowledge integration when learning a new task, this strategy also suffers from catastrophic forgetting due to an imbalance between old and new knowledge. To address this problem, we propose a novel replay strategy called Manifold Expansion Replay (MaER). We argue that expanding the implicit manifold of the knowledge representation in the episodic memory helps to improve the robustness and expressiveness of the model. To this end, we propose a greedy strategy to keep increasing the diameter of the implicit manifold represented by the knowledge in the buffer during memory management. In addition, we introduce Wasserstein distance instead of cross entropy as distillation loss to preserve previous knowledge. With extensive experimental validation on MNIST, CIFAR10, CIFAR100, and TinyImageNet, we show that the proposed method significantly improves the accuracy in continual learning setup, outperforming the state of the arts.
    MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback. (arXiv:2309.10691v2 [cs.CL] UPDATED)
    To solve complex tasks, large language models (LLMs) often require multiple rounds of interactions with the user, sometimes assisted by external tools. However, current evaluation protocols often emphasize benchmark performance with single-turn exchanges, neglecting the nuanced interactions among the user, LLMs, and external tools, while also underestimating the importance of natural language feedback from users. These oversights contribute to discrepancies between research benchmark evaluations and real-world use cases. We introduce MINT, a benchmark that evaluates LLMs' ability to solve tasks with multi-turn interactions by (1) using tools and (2) leveraging natural language feedback. To ensure reproducibility, we provide an evaluation framework where LLMs can access tools by executing Python code and receive users' natural language feedback simulated by GPT-4. We repurpose a diverse set of established evaluation datasets focusing on reasoning, coding, and decision-making and carefully curate them into a compact subset for efficient evaluation. Our analysis of 20 open- and closed-source LLMs offers intriguing findings. (a) LLMs generally benefit from tools and language feedback, with performance gains (absolute, same below) of 1-8% for each turn of tool use and 2-17% with natural language feedback. (b) Better single-turn performance does not guarantee better multi-turn performance. (c) Surprisingly, on the LLMs evaluated, supervised instruction-finetuning (SIFT) and reinforcement learning from human feedback (RLHF) generally hurt multi-turn capabilities. We expect MINT can help measure progress and incentivize research in improving LLMs' capabilities in multi-turn interactions, especially for open-source communities where multi-turn human evaluation can be less accessible compared to commercial LLMs with a larger user base.
    Explorable Mesh Deformation Subspaces from Unstructured Generative Models. (arXiv:2310.07814v1 [cs.GR])
    Exploring variations of 3D shapes is a time-consuming process in traditional 3D modeling tools. Deep generative models of 3D shapes often feature continuous latent spaces that can, in principle, be used to explore potential variations starting from a set of input shapes. In practice, doing so can be problematic: latent spaces are high dimensional and hard to visualize, contain shapes that are not relevant to the input shapes, and linear paths through them often lead to sub-optimal shape transitions. Furthermore, one would ideally be able to explore variations in the original high-quality meshes used to train the generative model, not its lower-quality output geometry. In this paper, we present a method to explore variations among a given set of landmark shapes by constructing a mapping from an easily-navigable 2D exploration space to a subspace of a pre-trained generative model. We first describe how to find a mapping that spans the set of input landmark shapes and exhibits smooth variations between them. We then show how to turn the variations in this subspace into deformation fields, to transfer those variations to high-quality meshes for the landmark shapes. Our results show that our method can produce visually-pleasing and easily-navigable 2D exploration spaces for several different shape categories, especially as compared to prior work on learning deformation spaces for 3D shapes.
    Accountability in Offline Reinforcement Learning: Explaining Decisions with a Corpus of Examples. (arXiv:2310.07747v1 [cs.LG])
    Learning transparent, interpretable controllers with offline data in decision-making systems is an essential area of research due to its potential to reduce the risk of applications in real-world systems. However, in responsibility-sensitive settings such as healthcare, decision accountability is of paramount importance, yet has not been adequately addressed by the literature. This paper introduces the Accountable Offline Controller (AOC) that employs the offline dataset as the Decision Corpus and performs accountable control based on a tailored selection of examples, referred to as the Corpus Subset. ABC operates effectively in low-data scenarios, can be extended to the strictly offline imitation setting, and displays qualities of both conservation and adaptability. We assess ABC's performance in both simulated and real-world healthcare scenarios, emphasizing its capability to manage offline control tasks with high levels of performance while maintaining accountability. Keywords: Interpretable Reinforcement Learning, Explainable Reinforcement Learning, Reinforcement Learning Transparency, Offline Reinforcement Learning, Batched Control.
    A Complete Recipe for Diffusion Generative Models. (arXiv:2303.01748v2 [cs.LG] UPDATED)
    Score-based Generative Models (SGMs) have demonstrated exceptional synthesis outcomes across various tasks. However, the current design landscape of the forward diffusion process remains largely untapped and often relies on physical heuristics or simplifying assumptions. Utilizing insights from the development of scalable Bayesian posterior samplers, we present a complete recipe for formulating forward processes in SGMs, ensuring convergence to the desired target distribution. Our approach reveals that several existing SGMs can be seen as specific manifestations of our framework. Building upon this method, we introduce Phase Space Langevin Diffusion (PSLD), which relies on score-based modeling within an augmented space enriched by auxiliary variables akin to physical phase space. Empirical results exhibit the superior sample quality and improved speed-quality trade-off of PSLD compared to various competing approaches on established image synthesis benchmarks. Remarkably, PSLD achieves sample quality akin to state-of-the-art SGMs (FID: 2.10 for unconditional CIFAR-10 generation). Lastly, we demonstrate the applicability of PSLD in conditional synthesis using pre-trained score networks, offering an appealing alternative as an SGM backbone for future advancements. Code and model checkpoints can be accessed at \url{https://github.com/mandt-lab/PSLD}.
    Interpreting Reward Models in RLHF-Tuned Language Models Using Sparse Autoencoders. (arXiv:2310.08164v1 [cs.LG])
    Large language models (LLMs) aligned to human preferences via reinforcement learning from human feedback (RLHF) underpin many commercial applications. However, how RLHF impacts LLM internals remains opaque. We propose a novel method to interpret learned reward functions in RLHF-tuned LLMs using sparse autoencoders. Our approach trains autoencoder sets on activations from a base LLM and its RLHF-tuned version. By comparing autoencoder hidden spaces, we identify unique features that reflect the accuracy of the learned reward model. To quantify this, we construct a scenario where the tuned LLM learns token-reward mappings to maximize reward. This is the first application of sparse autoencoders for interpreting learned rewards and broadly inspecting reward learning in LLMs. Our method provides an abstract approximation of reward integrity. This presents a promising technique for ensuring alignment between specified objectives and model behaviors.
    Score Regularized Policy Optimization through Diffusion Behavior. (arXiv:2310.07297v2 [cs.LG] UPDATED)
    Recent developments in offline reinforcement learning have uncovered the immense potential of diffusion modeling, which excels at representing heterogeneous behavior policies. However, sampling from diffusion policies is considerably slow because it necessitates tens to hundreds of iterative inference steps for one action. To address this issue, we propose to extract an efficient deterministic inference policy from critic models and pretrained diffusion behavior models, leveraging the latter to directly regularize the policy gradient with the behavior distribution's score function during optimization. Our method enjoys powerful generative capabilities of diffusion modeling while completely circumventing the computationally intensive and time-consuming diffusion sampling scheme, both during training and evaluation. Extensive results on D4RL tasks show that our method boosts action sampling speed by more than 25 times compared with various leading diffusion-based methods in locomotion tasks, while still maintaining state-of-the-art performance.
    Only Pay for What Is Uncertain: Variance-Adaptive Thompson Sampling. (arXiv:2303.09033v2 [cs.LG] UPDATED)
    Most bandit algorithms assume that the reward variances or their upper bounds are known, and that they are the same for all arms. This naturally leads to suboptimal performance and higher regret due to variance overestimation. On the other hand, underestimated reward variances may lead to linear regret due to committing early to a suboptimal arm. This motivated prior works on variance-adaptive frequentist algorithms, which have strong instance-dependent regret bounds but cannot incorporate prior knowledge on reward variances. We lay foundations for the Bayesian setting, which incorporates prior knowledge. This results in lower regret in practice, due to using the prior in the algorithm design, and also improved regret guarantees. Specifically, we study Gaussian bandits with {unknown heterogeneous reward variances}, and develop a Thompson sampling algorithm with prior-dependent Bayes regret bounds. We achieve lower regret with lower reward variances and more informative priors on them, which is precisely why we pay only for what is uncertain. This is the first result of its kind. Finally, we corroborate our theory with extensive experiments, which show the superiority of our variance-adaptive Bayesian algorithm over prior frequentist approaches. We also show that our approach is robust to model misspecification and can be applied with estimated priors.
    Extreme Image Transformations Facilitate Robust Latent Object Representations. (arXiv:2310.07725v1 [cs.LG])
    Adversarial attacks can affect the object recognition capabilities of machines in wild. These can often result from spurious correlations between input and class labels, and are prone to memorization in large networks. While networks are expected to do automated feature selection, it is not effective at the scale of the object. Humans, however, are able to select the minimum set of features required to form a robust representation of an object. In this work, we show that finetuning any pretrained off-the-shelf network with Extreme Image Transformations (EIT) not only helps in learning a robust latent representation, it also improves the performance of these networks against common adversarial attacks of various intensities. Our EIT trained networks show strong activations in the object regions even when tested with more intense noise, showing promising generalizations across different kinds of adversarial attacks.
    Physics Constrained Unsupervised Deep Learning for Rapid, High Resolution Scanning Coherent Diffraction Reconstruction. (arXiv:2306.11014v2 [physics.comp-ph] UPDATED)
    By circumventing the resolution limitations of optics, coherent diffractive imaging (CDI) and ptychography are making their way into scientific fields ranging from X-ray imaging to astronomy. Yet, the need for time consuming iterative phase recovery hampers real-time imaging. While supervised deep learning strategies have increased reconstruction speed, they sacrifice image quality. Furthermore, these methods' demand for extensive labeled training data is experimentally burdensome. Here, we propose an unsupervised physics-informed neural network reconstruction method, PtychoPINN, that retains the factor of 100-to-1000 speedup of deep learning-based reconstruction while improving reconstruction quality by combining the diffraction forward map with real-space constraints from overlapping measurements. In particular, PtychoPINN significantly advances generalizability, accuracy (with a typical 10 dB PSNR increase), and linear resolution (2- to 6-fold gain). This blend of performance and speed offers exciting prospects for high-resolution real-time imaging in high-throughput environments such as X-ray free electron lasers (XFELs) and diffraction-limited light sources.
    In-Context Unlearning: Language Models as Few Shot Unlearners. (arXiv:2310.07579v2 [cs.LG] UPDATED)
    Machine unlearning, the study of efficiently removing the impact of specific training points on the trained model, has garnered increased attention of late, driven by the need to comply with privacy regulations like the Right to be Forgotten. Although unlearning is particularly relevant for LLMs in light of the copyright issues they raise, achieving precise unlearning is computationally infeasible for very large models. To this end, recent work has proposed several algorithms which approximate the removal of training data without retraining the model. These algorithms crucially rely on access to the model parameters in order to update them, an assumption that may not hold in practice due to computational constraints or when the LLM is accessed via API. In this work, we propose a new class of unlearning methods for LLMs we call ''In-Context Unlearning'', providing inputs in context and without having to update model parameters. To unlearn a particular training instance, we provide the instance alongside a flipped label and additional correctly labelled instances which are prepended as inputs to the LLM at inference time. Our experimental results demonstrate that these contexts effectively remove specific information from the training set while maintaining performance levels that are competitive with (or in some cases exceed) state-of-the-art unlearning methods that require access to the LLM parameters.
    Lifelong Audio-video Masked Autoencoder with Forget-robust Localized Alignments. (arXiv:2310.08204v1 [cs.CV])
    We present a lifelong audio-video masked autoencoder that continually learns the multimodal representations from a video stream containing audio-video pairs, while its distribution continually shifts over time. Specifically, we propose two novel ideas to tackle the problem: (1) Localized Alignment: We introduce a small trainable multimodal encoder that predicts the audio and video tokens that are well-aligned with each other. This allows the model to learn only the highly correlated audiovisual patches with accurate multimodal relationships. (2) Forget-robust multimodal patch selection: We compare the relative importance of each audio-video patch between the current and past data pair to mitigate unintended drift of the previously learned audio-video representations. Our proposed method, FLAVA (Forget-robust Localized Audio-Video Alignment), therefore, captures the complex relationships between the audio and video modalities during training on a sequence of pre-training tasks while alleviating the forgetting of learned audiovisual correlations. Our experiments validate that FLAVA outperforms the state-of-the-art continual learning methods on several benchmark datasets under continual audio-video representation learning scenarios.
    Impact of Co-occurrence on Factual Knowledge of Large Language Models. (arXiv:2310.08256v1 [cs.CL])
    Large language models (LLMs) often make factually incorrect responses despite their success in various applications. In this paper, we hypothesize that relying heavily on simple co-occurrence statistics of the pre-training corpora is one of the main factors that cause factual errors. Our results reveal that LLMs are vulnerable to the co-occurrence bias, defined as preferring frequently co-occurred words over the correct answer. Consequently, LLMs struggle to recall facts whose subject and object rarely co-occur in the pre-training dataset although they are seen during finetuning. We show that co-occurrence bias remains despite scaling up model sizes or finetuning. Therefore, we suggest finetuning on a debiased dataset to mitigate the bias by filtering out biased samples whose subject-object co-occurrence count is high. Although debiased finetuning allows LLMs to memorize rare facts in the training set, it is not effective in recalling rare facts unseen during finetuning. Further research in mitigation will help build reliable language models by preventing potential errors. The code is available at \url{https://github.com/CheongWoong/impact_of_cooccurrence}.
    Observatory: Characterizing Embeddings of Relational Tables. (arXiv:2310.07736v1 [cs.DB])
    Language models and specialized table embedding models have recently demonstrated strong performance on many tasks over tabular data. Researchers and practitioners are keen to leverage these models in many new application contexts; but limited understanding of the strengths and weaknesses of these models, and the table representations they generate, makes the process of finding a suitable model for a given task reliant on trial and error. There is an urgent need to gain a comprehensive understanding of these models to minimize inefficiency and failures in downstream usage. To address this need, we propose Observatory, a formal framework to systematically analyze embedding representations of relational tables. Motivated both by invariants of the relational data model and by statistical considerations regarding data distributions, we define eight primitive properties, and corresponding measures to quantitatively characterize table embeddings for these properties. Based on these properties, we define an extensible framework to evaluate language and table embedding models. We collect and synthesize a suite of datasets and use Observatory to analyze seven such models. Our analysis provides insights into the strengths and weaknesses of learned representations over tables. We find, for example, that some models are sensitive to table structure such as column order, that functional dependencies are rarely reflected in embeddings, and that specialized table embedding models have relatively lower sample fidelity. Such insights help researchers and practitioners better anticipate model behaviors and select appropriate models for their downstream tasks, while guiding researchers in the development of new models.
    Understanding Sparse Feature Updates in Deep Networks using Iterative Linearisation. (arXiv:2211.12345v4 [cs.LG] UPDATED)
    Larger and deeper networks generalise well despite their increased capacity to overfit. Understanding why this happens is theoretically and practically important. One recent approach looks at the infinitely wide limits of such networks and their corresponding kernels. However, these theoretical tools cannot fully explain finite networks as the empirical kernel changes significantly during gradient-descent-based training in contrast to infinite networks. In this work, we derive an iterative linearised training method as a novel empirical tool to further investigate this distinction, allowing us to control for sparse (i.e. infrequent) feature updates and quantify the frequency of feature learning needed to achieve comparable performance. We justify iterative linearisation as an interpolation between a finite analog of the infinite width regime, which does not learn features, and standard gradient descent training, which does. Informally, we also show that it is analogous to a damped version of the Gauss-Newton algorithm -- a second-order method. We show that in a variety of cases, iterative linearised training surprisingly performs on par with standard training, noting in particular how much less frequent feature learning is required to achieve comparable performance. We also show that feature learning is essential for good performance. Since such feature learning inevitably causes changes in the NTK kernel, we provide direct negative evidence for the NTK theory, which states the NTK kernel remains constant during training.
    Dealing with zero-inflated data: achieving SOTA with a two-fold machine learning approach. (arXiv:2310.08088v1 [cs.LG])
    In many cases, a machine learning model must learn to correctly predict a few data points with particular values of interest in a broader range of data where many target values are zero. Zero-inflated data can be found in diverse scenarios, such as lumpy and intermittent demands, power consumption for home appliances being turned on and off, impurities measurement in distillation processes, and even airport shuttle demand prediction. The presence of zeroes affects the models' learning and may result in poor performance. Furthermore, zeroes also distort the metrics used to compute the model's prediction quality. This paper showcases two real-world use cases (home appliances classification and airport shuttle demand prediction) where a hierarchical model applied in the context of zero-inflated data leads to excellent results. In particular, for home appliances classification, the weighted average of Precision, Recall, F1, and AUC ROC was increased by 27%, 34%, 49%, and 27%, respectively. Furthermore, it is estimated that the proposed approach is also four times more energy efficient than the SOTA approach against which it was compared to. Two-fold models performed best in all cases when predicting airport shuttle demand, and the difference against other models has been proven to be statistically significant.  ( 2 min )
    Invisible Threats: Backdoor Attack in OCR Systems. (arXiv:2310.08259v1 [cs.CR])
    Optical Character Recognition (OCR) is a widely used tool to extract text from scanned documents. Today, the state-of-the-art is achieved by exploiting deep neural networks. However, the cost of this performance is paid at the price of system vulnerability. For instance, in backdoor attacks, attackers compromise the training phase by inserting a backdoor in the victim's model that will be activated at testing time by specific patterns while leaving the overall model performance intact. This work proposes a backdoor attack for OCR resulting in the injection of non-readable characters from malicious input images. This simple but effective attack exposes the state-of-the-art OCR weakness, making the extracted text correct to human eyes but simultaneously unusable for the NLP application that uses OCR as a preprocessing step. Experimental results show that the attacked models successfully output non-readable characters for around 90% of the poisoned instances without harming their performance for the remaining instances.
    Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction. (arXiv:2107.14432v4 [cs.LG] UPDATED)
    We develop a novel framework that adds the regularizers of the sparse group lasso to a family of adaptive optimizers in deep learning, such as Momentum, Adagrad, Adam, AMSGrad, AdaHessian, and create a new class of optimizers, which are named Group Momentum, Group Adagrad, Group Adam, Group AMSGrad and Group AdaHessian, etc., accordingly. We establish theoretically proven convergence guarantees in the stochastic convex settings, based on primal-dual methods. We evaluate the regularized effect of our new optimizers on three large-scale real-world ad click datasets with state-of-the-art deep learning models. The experimental results reveal that compared with the original optimizers with the post-processing procedure which uses the magnitude pruning method, the performance of the models can be significantly improved on the same sparsity level. Furthermore, in comparison to the cases without magnitude pruning, our methods can achieve extremely high sparsity with significantly better or highly competitive performance. The code is available at https://github.com/intelligent-machine-learning/dlrover/blob/master/tfplus.  ( 3 min )
    GIO: Gradient Information Optimization for Training Dataset Selection. (arXiv:2306.11670v2 [cs.LG] UPDATED)
    It is often advantageous to train models on a subset of the available train examples, because the examples are of variable quality or because one would like to train with fewer examples, without sacrificing performance. We present Gradient Information Optimization (GIO), a scalable, task-agnostic approach to this data selection problem that requires only a small set of (unlabeled) examples representing a target distribution. GIO begins from a natural, information-theoretic objective that is intractable in practice. Our contribution is in showing that it can be made highly scalable through a simple relaxation of the objective and a highly efficient implementation. In experiments with machine translation, spelling correction, and image recognition, we show that GIO delivers outstanding results with very small train sets. These findings are robust to different representation models and hyperparameters for GIO itself. GIO is task- and domain-agnostic and can be applied out-of-the-box to new datasets and domains.
    A Theory of Non-Linear Feature Learning with One Gradient Step in Two-Layer Neural Networks. (arXiv:2310.07891v1 [stat.ML])
    Feature learning is thought to be one of the fundamental reasons for the success of deep neural networks. It is rigorously known that in two-layer fully-connected neural networks under certain conditions, one step of gradient descent on the first layer followed by ridge regression on the second layer can lead to feature learning; characterized by the appearance of a separated rank-one component -- spike -- in the spectrum of the feature matrix. However, with a constant gradient descent step size, this spike only carries information from the linear component of the target function and therefore learning non-linear components is impossible. We show that with a learning rate that grows with the sample size, such training in fact introduces multiple rank-one components, each corresponding to a specific polynomial feature. We further prove that the limiting large-dimensional and large sample training and test errors of the updated neural networks are fully characterized by these spikes. By precisely analyzing the improvement in the loss, we demonstrate that these non-linear features can enhance learning.
    QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models. (arXiv:2310.08041v1 [cs.CL])
    Large Language Models (LLMs) excel in NLP, but their demands hinder their widespread deployment. While Quantization-Aware Training (QAT) offers a solution, its extensive training costs make Post-Training Quantization (PTQ) a more practical approach for LLMs. In existing studies, activation outliers in particular channels are identified as the bottleneck to PTQ accuracy. They propose to transform the magnitudes from activations to weights, which however offers limited alleviation or suffers from unstable gradients, resulting in a severe performance drop at low-bitwidth. In this paper, we propose QLLM, an accurate and efficient low-bitwidth PTQ method designed for LLMs. QLLM introduces an adaptive channel reassembly technique that reallocates the magnitude of outliers to other channels, thereby mitigating their impact on the quantization range. This is achieved by channel disassembly and channel assembly, which first breaks down the outlier channels into several sub-channels to ensure a more balanced distribution of activation magnitudes. Then similar channels are merged to maintain the original channel number for efficiency. Additionally, an adaptive strategy is designed to autonomously determine the optimal number of sub-channels for channel disassembly. To further compensate for the performance loss caused by quantization, we propose an efficient tuning method that only learns a small number of low-rank weights while freezing the pre-trained quantized model. After training, these low-rank parameters can be fused into the frozen weights without affecting inference. Extensive experiments on LLaMA-1 and LLaMA-2 show that QLLM can obtain accurate quantized models efficiently. For example, QLLM quantizes the 4-bit LLaMA-2-70B within 10 hours on a single A100-80G GPU, outperforming the previous state-of-the-art method by 7.89% on the average accuracy across five zero-shot tasks.  ( 3 min )
    TriRE: A Multi-Mechanism Learning Paradigm for Continual Knowledge Retention and Promotion. (arXiv:2310.08217v1 [cs.AI])
    Continual learning (CL) has remained a persistent challenge for deep neural networks due to catastrophic forgetting (CF) of previously learned tasks. Several techniques such as weight regularization, experience rehearsal, and parameter isolation have been proposed to alleviate CF. Despite their relative success, these research directions have predominantly remained orthogonal and suffer from several shortcomings, while missing out on the advantages of competing strategies. On the contrary, the brain continually learns, accommodates, and transfers knowledge across tasks by simultaneously leveraging several neurophysiological processes, including neurogenesis, active forgetting, neuromodulation, metaplasticity, experience rehearsal, and context-dependent gating, rarely resulting in CF. Inspired by how the brain exploits multiple mechanisms concurrently, we propose TriRE, a novel CL paradigm that encompasses retaining the most prominent neurons for each task, revising and solidifying the extracted knowledge of current and past tasks, and actively promoting less active neurons for subsequent tasks through rewinding and relearning. Across CL settings, TriRE significantly reduces task interference and surpasses different CL approaches considered in isolation.  ( 2 min )
    Improving Fast Minimum-Norm Attacks with Hyperparameter Optimization. (arXiv:2310.08177v1 [cs.LG])
    Evaluating the adversarial robustness of machine learning models using gradient-based attacks is challenging. In this work, we show that hyperparameter optimization can improve fast minimum-norm attacks by automating the selection of the loss function, the optimizer and the step-size scheduler, along with the corresponding hyperparameters. Our extensive evaluation involving several robust models demonstrates the improved efficacy of fast minimum-norm attacks when hyper-up with hyperparameter optimization. We release our open-source code at https://github.com/pralab/HO-FMN.  ( 2 min )
    Data-Centric Learning from Unlabeled Graphs with Diffusion Model. (arXiv:2303.10108v2 [cs.LG] UPDATED)
    Graph property prediction tasks are important and numerous. While each task offers a small size of labeled examples, unlabeled graphs have been collected from various sources and at a large scale. A conventional approach is training a model with the unlabeled graphs on self-supervised tasks and then fine-tuning the model on the prediction tasks. However, the self-supervised task knowledge could not be aligned or sometimes conflicted with what the predictions needed. In this paper, we propose to extract the knowledge underlying the large set of unlabeled graphs as a specific set of useful data points to augment each property prediction model. We use a diffusion model to fully utilize the unlabeled graphs and design two new objectives to guide the model's denoising process with each task's labeled data to generate task-specific graph examples and their labels. Experiments demonstrate that our data-centric approach performs significantly better than fifteen existing various methods on fifteen tasks. The performance improvement brought by unlabeled data is visible as the generated labeled examples unlike the self-supervised learning.  ( 2 min )
    L2P: Learning to Place for Estimating Heavy-Tailed Distributed Outcomes. (arXiv:1908.04628v3 [cs.LG] UPDATED)
    Many real-world prediction tasks have outcome variables that have characteristic heavy-tail distributions. Examples include copies of books sold, auction prices of art pieces, demand for commodities in warehouses, etc. By learning heavy-tailed distributions, "big and rare" instances (e.g., the best-sellers) will have accurate predictions. Most existing approaches are not dedicated to learning heavy-tailed distribution; thus, they heavily under-predict such instances. To tackle this problem, we introduce Learning to Place (L2P), which exploits the pairwise relationships between instances for learning. In its training phase, L2P learns a pairwise preference classifier: is instance A > instance B? In its placing phase, L2P obtains a prediction by placing the new instance among the known instances. Based on its placement, the new instance is then assigned a value for its outcome variable. Experiments on real data show that L2P outperforms competing approaches in terms of accuracy and ability to reproduce heavy-tailed outcome distribution. In addition, L2P provides an interpretable model by placing each predicted instance in relation to its comparable neighbors. Interpretable models are highly desirable when lives and treasure are at stake.
    ETDock: A Novel Equivariant Transformer for Protein-Ligand Docking. (arXiv:2310.08061v1 [q-bio.BM])
    Predicting the docking between proteins and ligands is a crucial and challenging task for drug discovery. However, traditional docking methods mainly rely on scoring functions, and deep learning-based docking approaches usually neglect the 3D spatial information of proteins and ligands, as well as the graph-level features of ligands, which limits their performance. To address these limitations, we propose an equivariant transformer neural network for protein-ligand docking pose prediction. Our approach involves the fusion of ligand graph-level features by feature processing, followed by the learning of ligand and protein representations using our proposed TAMformer module. Additionally, we employ an iterative optimization approach based on the predicted distance matrix to generate refined ligand poses. The experimental results on real datasets show that our model can achieve state-of-the-art performance.  ( 2 min )
    Lag-Llama: Towards Foundation Models for Time Series Forecasting. (arXiv:2310.08278v1 [cs.LG])
    Aiming to build foundation models for time-series forecasting and study their scaling behavior, we present here our work-in-progress on Lag-Llama, a general-purpose univariate probabilistic time-series forecasting model trained on a large collection of time-series data. The model shows good zero-shot prediction capabilities on unseen "out-of-distribution" time-series datasets, outperforming supervised baselines. We use smoothly broken power-laws to fit and predict model scaling behavior. The open source code is made available at https://github.com/kashif/pytorch-transformer-ts.
    Rethinking Large-scale Pre-ranking System: Entire-chain Cross-domain Models. (arXiv:2310.08039v1 [cs.IR])
    Industrial systems such as recommender systems and online advertising, have been widely equipped with multi-stage architectures, which are divided into several cascaded modules, including matching, pre-ranking, ranking and re-ranking. As a critical bridge between matching and ranking, existing pre-ranking approaches mainly endure sample selection bias (SSB) problem owing to ignoring the entire-chain data dependence, resulting in sub-optimal performances. In this paper, we rethink pre-ranking system from the perspective of the entire sample space, and propose Entire-chain Cross-domain Models (ECM), which leverage samples from the whole cascaded stages to effectively alleviate SSB problem. Besides, we design a fine-grained neural structure named ECMM to further improve the pre-ranking accuracy. Specifically, we propose a cross-domain multi-tower neural network to comprehensively predict for each stage result, and introduce the sub-networking routing strategy with $L0$ regularization to reduce computational costs. Evaluations on real-world large-scale traffic logs demonstrate that our pre-ranking models outperform SOTA methods while time consumption is maintained within an acceptable level, which achieves better trade-off between efficiency and effectiveness.
    LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios. (arXiv:2310.08348v1 [cs.LG])
    Building agents based on tree-search planning capabilities with learned models has achieved remarkable success in classic decision-making problems, such as Go and Atari. However, it has been deemed challenging or even infeasible to extend Monte Carlo Tree Search (MCTS) based algorithms to diverse real-world applications, especially when these environments involve complex action spaces and significant simulation costs, or inherent stochasticity. In this work, we introduce LightZero, the first unified benchmark for deploying MCTS/MuZero in general sequential decision scenarios. Specificially, we summarize the most critical challenges in designing a general MCTS-style decision-making solver, then decompose the tightly-coupled algorithm and system design of tree-search RL methods into distinct sub-modules. By incorporating more appropriate exploration and optimization strategies, we can significantly enhance these sub-modules and construct powerful LightZero agents to tackle tasks across a wide range of domains, such as board games, Atari, MuJoCo, MiniGrid and GoBigger. Detailed benchmark results reveal the significant potential of such methods in building scalable and efficient decision intelligence. The code is available as part of OpenDILab at https://github.com/opendilab/LightZero.  ( 2 min )
    Why Train More? Effective and Efficient Membership Inference via Memorization. (arXiv:2310.08015v1 [cs.LG])
    Membership Inference Attacks (MIAs) aim to identify specific data samples within the private training dataset of machine learning models, leading to serious privacy violations and other sophisticated threats. Many practical black-box MIAs require query access to the data distribution (the same distribution where the private data is drawn) to train shadow models. By doing so, the adversary obtains models trained "with" or "without" samples drawn from the distribution, and analyzes the characteristics of the samples under consideration. The adversary is often required to train more than hundreds of shadow models to extract the signals needed for MIAs; this becomes the computational overhead of MIAs. In this paper, we propose that by strategically choosing the samples, MI adversaries can maximize their attack success while minimizing the number of shadow models. First, our motivational experiments suggest memorization as the key property explaining disparate sample vulnerability to MIAs. We formalize this through a theoretical bound that connects MI advantage with memorization. Second, we show sample complexity bounds that connect the number of shadow models needed for MIAs with memorization. Lastly, we confirm our theoretical arguments with comprehensive experiments; by utilizing samples with high memorization scores, the adversary can (a) significantly improve its efficacy regardless of the MIA used, and (b) reduce the number of shadow models by nearly two orders of magnitude compared to state-of-the-art approaches.  ( 2 min )
    NeRF2: Neural Radio-Frequency Radiance Fields. (arXiv:2305.06118v2 [cs.NI] UPDATED)
    Although Maxwell discovered the physical laws of electromagnetic waves 160 years ago, how to precisely model the propagation of an RF signal in an electrically large and complex environment remains a long-standing problem. The difficulty is in the complex interactions between the RF signal and the obstacles (e.g., reflection, diffraction, etc.). Inspired by the great success of using a neural network to describe the optical field in computer vision, we propose a neural radio-frequency radiance field, NeRF$^\textbf{2}$, which represents a continuous volumetric scene function that makes sense of an RF signal's propagation. Particularly, after training with a few signal measurements, NeRF$^\textbf{2}$ can tell how/what signal is received at any position when it knows the position of a transmitter. As a physical-layer neural network, NeRF$^\textbf{2}$ can take advantage of the learned statistic model plus the physical model of ray tracing to generate a synthetic dataset that meets the training demands of application-layer artificial neural networks (ANNs). Thus, we can boost the performance of ANNs by the proposed turbo-learning, which mixes the true and synthetic datasets to intensify the training. Our experiment results show that turbo-learning can enhance performance with an approximate 50% increase. We also demonstrate the power of NeRF$^\textbf{2}$ in the field of indoor localization and 5G MIMO.
    Quasi-Arithmetic Mixtures, Divergence Minimization, and Bregman Information. (arXiv:2209.07481v2 [cs.LG] UPDATED)
    Markov Chain Monte Carlo methods for sampling from complex distributions and estimating normalization constants often simulate samples from a sequence of intermediate distributions along an annealing path, which bridges between a tractable initial distribution and a target density of interest. Prior work has constructed annealing paths using quasi-arithmetic means, and interpreted the resulting intermediate densities as minimizing an expected divergence to the endpoints. We provide a comprehensive analysis of this 'centroid' property using Bregman divergences under a monotonic embedding of the density function, thereby associating common divergences such as Amari's and Renyi's ${\alpha}$-divergences, ${(\alpha,\beta)}$-divergences, and the Jensen-Shannon divergence with intermediate densities along an annealing path. Our analysis highlights the interplay between parametric families, quasi-arithmetic means, and divergence functions using the rho-tau Bregman divergence framework of Zhang 2004,2013.
    MemSAC: Memory Augmented Sample Consistency for Large Scale Unsupervised Domain Adaptation. (arXiv:2207.12389v2 [cs.CV] UPDATED)
    Practical real world datasets with plentiful categories introduce new challenges for unsupervised domain adaptation like small inter-class discriminability, that existing approaches relying on domain invariance alone cannot handle sufficiently well. In this work we propose MemSAC, which exploits sample level similarity across source and target domains to achieve discriminative transfer, along with architectures that scale to a large number of categories. For this purpose, we first introduce a memory augmented approach to efficiently extract pairwise similarity relations between labeled source and unlabeled target domain instances, suited to handle an arbitrary number of classes. Next, we propose and theoretically justify a novel variant of the contrastive loss to promote local consistency among within-class cross domain samples while enforcing separation between classes, thus preserving discriminative transfer from source to target. We validate the advantages of MemSAC with significant improvements over previous state-of-the-art on multiple challenging transfer tasks designed for large-scale adaptation, such as DomainNet with 345 classes and fine-grained adaptation on Caltech-UCSD birds dataset with 200 classes. We also provide in-depth analysis and insights into the effectiveness of MemSAC.
    Learning Joint Latent Space EBM Prior Model for Multi-layer Generator. (arXiv:2306.06323v2 [cs.CV] UPDATED)
    This paper studies the fundamental problem of learning multi-layer generator models. The multi-layer generator model builds multiple layers of latent variables as a prior model on top of the generator, which benefits learning complex data distribution and hierarchical representations. However, such a prior model usually focuses on modeling inter-layer relations between latent variables by assuming non-informative (conditional) Gaussian distributions, which can be limited in model expressivity. To tackle this issue and learn more expressive prior models, we propose an energy-based model (EBM) on the joint latent space over all layers of latent variables with the multi-layer generator as its backbone. Such joint latent space EBM prior model captures the intra-layer contextual relations at each layer through layer-wise energy terms, and latent variables across different layers are jointly corrected. We develop a joint training scheme via maximum likelihood estimation (MLE), which involves Markov Chain Monte Carlo (MCMC) sampling for both prior and posterior distributions of the latent variables from different layers. To ensure efficient inference and learning, we further propose a variational training scheme where an inference model is used to amortize the costly posterior MCMC sampling. Our experiments demonstrate that the learned model can be expressive in generating high-quality images and capturing hierarchical features for better outlier detection.  ( 2 min )
    Precise localization within the GI tract by combining classification of CNNs and time-series analysis of HMMs. (arXiv:2310.07895v1 [cs.LG])
    This paper presents a method to efficiently classify the gastroenterologic section of images derived from Video Capsule Endoscopy (VCE) studies by exploring the combination of a Convolutional Neural Network (CNN) for classification with the time-series analysis properties of a Hidden Markov Model (HMM). It is demonstrated that successive time-series analysis identifies and corrects errors in the CNN output. Our approach achieves an accuracy of $98.04\%$ on the Rhode Island (RI) Gastroenterology dataset. This allows for precise localization within the gastrointestinal (GI) tract while requiring only approximately 1M parameters and thus, provides a method suitable for low power devices  ( 2 min )
    Participatory Personalization in Classification. (arXiv:2302.03874v2 [cs.LG] UPDATED)
    Machine learning models are often personalized with information that is protected, sensitive, self-reported, or costly to acquire. These models use information about people but do not facilitate nor inform their consent. Individuals cannot opt out of reporting personal information to a model, nor tell if they benefit from personalization in the first place. We introduce a family of classification models, called participatory systems, that let individuals opt into personalization at prediction time. We present a model-agnostic algorithm to learn participatory systems for personalization with categorical group attributes. We conduct a comprehensive empirical study of participatory systems in clinical prediction tasks, benchmarking them with common approaches for personalization and imputation. Our results demonstrate that participatory systems can facilitate and inform consent while improving performance and data use across all groups who report personal data.
    Efficient Hyperdimensional Computing. (arXiv:2301.10902v2 [cs.LG] UPDATED)
    Hyperdimensional computing (HDC) is a method to perform classification that uses binary vectors with high dimensions and the majority rule. This approach has the potential to be energy-efficient and hence deemed suitable for resource-limited platforms due to its simplicity and massive parallelism. However, in order to achieve high accuracy, HDC sometimes uses hypervectors with tens of thousands of dimensions. This potentially negates its efficiency advantage. In this paper, we examine the necessity of such high dimensions and conduct a detailed theoretical analysis of the relationship between hypervector dimensions and accuracy. Our results demonstrate that as the dimension of the hypervectors increases, the worst-case/average-case HDC prediction accuracy with the majority rule decreases. Building on this insight, we develop HDC models that use binary hypervectors with dimensions orders of magnitude lower than those of state-of-the-art HDC models while maintaining equivalent or even improved accuracy and efficiency. For instance, on the MNIST dataset, we achieve 91.12% HDC accuracy in image classification with a dimension of only 64. Our methods perform operations that are only 0.35% of other HDC models with dimensions of 10,000. Furthermore, we evaluate our methods on ISOLET, UCI-HAR, and Fashion-MNIST datasets and investigate the limits of HDC computing.  ( 2 min )
    Does Synthetic Data Make Large Language Models More Efficient?. (arXiv:2310.07830v1 [cs.CL])
    Natural Language Processing (NLP) has undergone transformative changes with the advent of deep learning methodologies. One challenge persistently confronting researchers is the scarcity of high-quality, annotated datasets that drive these models. This paper explores the nuances of synthetic data generation in NLP, with a focal point on template-based question generation. By assessing its advantages, including data augmentation potential and the introduction of structured variety, we juxtapose these benefits against inherent limitations, such as the risk of overfitting and the constraints posed by pre-defined templates. Drawing from empirical evaluations, we demonstrate the impact of template-based synthetic data on the performance of modern transformer models. We conclude by emphasizing the delicate balance required between synthetic and real-world data, and the future trajectories of integrating synthetic data in model training pipelines. The findings aim to guide NLP practitioners in harnessing synthetic data's potential, ensuring optimal model performance in diverse applications.  ( 2 min )
    Spiral-Elliptical automated galaxy morphology classification from telescope images. (arXiv:2310.07740v1 [astro-ph.IM])
    The classification of galaxy morphologies is an important step in the investigation of theories of hierarchical structure formation. While human expert visual classification remains quite effective and accurate, it cannot keep up with the massive influx of data from emerging sky surveys. A variety of approaches have been proposed to classify large numbers of galaxies; these approaches include crowdsourced visual classification, and automated and computational methods, such as machine learning methods based on designed morphology statistics and deep learning. In this work, we develop two novel galaxy morphology statistics, descent average and descent variance, which can be efficiently extracted from telescope galaxy images. We further propose simplified versions of the existing image statistics concentration, asymmetry, and clumpiness, which have been widely used in the literature of galaxy morphologies. We utilize the galaxy image data from the Sloan Digital Sky Survey to demonstrate the effective performance of our proposed image statistics at accurately detecting spiral and elliptical galaxies when used as features of a random forest classifier.  ( 2 min )
    Joint Metrics Matter: A Better Standard for Trajectory Forecasting. (arXiv:2305.06292v2 [cs.RO] UPDATED)
    Multi-modal trajectory forecasting methods commonly evaluate using single-agent metrics (marginal metrics), such as minimum Average Displacement Error (ADE) and Final Displacement Error (FDE), which fail to capture joint performance of multiple interacting agents. Only focusing on marginal metrics can lead to unnatural predictions, such as colliding trajectories or diverging trajectories for people who are clearly walking together as a group. Consequently, methods optimized for marginal metrics lead to overly-optimistic estimations of performance, which is detrimental to progress in trajectory forecasting research. In response to the limitations of marginal metrics, we present the first comprehensive evaluation of state-of-the-art (SOTA) trajectory forecasting methods with respect to multi-agent metrics (joint metrics): JADE, JFDE, and collision rate. We demonstrate the importance of joint metrics as opposed to marginal metrics with quantitative evidence and qualitative examples drawn from the ETH / UCY and Stanford Drone datasets. We introduce a new loss function incorporating joint metrics that, when applied to a SOTA trajectory forecasting method, achieves a 7\% improvement in JADE / JFDE on the ETH / UCY datasets with respect to the previous SOTA. Our results also indicate that optimizing for joint metrics naturally leads to an improvement in interaction modeling, as evidenced by a 16\% decrease in mean collision rate on the ETH / UCY datasets with respect to the previous SOTA. Code is available at \texttt{\hyperlink{https://github.com/ericaweng/joint-metrics-matter}{github.com/ericaweng/joint-metrics-matter}}.  ( 3 min )
    DeePref: Deep Reinforcement Learning For Video Prefetching In Content Delivery Networks. (arXiv:2310.07881v1 [cs.NI])
    Content Delivery Networks carry the majority of Internet traffic, and the increasing demand for video content as a major IP traffic across the Internet highlights the importance of caching and prefetching optimization algorithms. Prefetching aims to make data available in the cache before the requester places its request to reduce access time and improve the Quality of Experience on the user side. Prefetching is well investigated in operating systems, compiler instructions, in-memory cache, local storage systems, high-speed networks, and cloud systems. Traditional prefetching techniques are well adapted to a particular access pattern, but fail to adapt to sudden variations or randomization in workloads. This paper explores the use of reinforcement learning to tackle the changes in user access patterns and automatically adapt over time. To this end, we propose, DeePref, a Deep Reinforcement Learning agent for online video content prefetching in Content Delivery Networks. DeePref is a prefetcher implemented on edge networks and is agnostic to hardware design, operating systems, and applications. Our results show that DeePref DRQN, using a real-world dataset, achieves a 17% increase in prefetching accuracy and a 28% increase in prefetching coverage on average compared to baseline approaches that use video content popularity as a building block to statically or dynamically make prefetching decisions. We also study the possibility of transfer learning of statistical models from one edge network into another, where unseen user requests from unknown distribution are observed. In terms of transfer learning, the increase in prefetching accuracy and prefetching coverage are [$30%$, $10%$], respectively. Our source code will be available on Github.  ( 3 min )
    Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling. (arXiv:2310.07786v1 [cs.LG])
    Real-world applications of contextual bandits often exhibit non-stationarity due to seasonality, serendipity, and evolving social trends. While a number of non-stationary contextual bandit learning algorithms have been proposed in the literature, they excessively explore due to a lack of prioritization for information of enduring value, or are designed in ways that do not scale in modern applications with high-dimensional user-specific features and large action set, or both. In this paper, we introduce a novel non-stationary contextual bandit algorithm that addresses these concerns. It combines a scalable, deep-neural-network-based architecture with a carefully designed exploration mechanism that strategically prioritizes collecting information with the most lasting value in a non-stationary environment. Through empirical evaluations on two real-world recommendation datasets, which exhibit pronounced non-stationarity, we demonstrate that our approach significantly outperforms the state-of-the-art baselines.  ( 2 min )
    Multi-Scale Spatial-Temporal Recurrent Networks for Traffic Flow Prediction. (arXiv:2310.08138v1 [cs.LG])
    Traffic flow prediction is one of the most fundamental tasks of intelligent transportation systems. The complex and dynamic spatial-temporal dependencies make the traffic flow prediction quite challenging. Although existing spatial-temporal graph neural networks hold prominent, they often encounter challenges such as (1) ignoring the fixed graph that limits the predictive performance of the model, (2) insufficiently capturing complex spatial-temporal dependencies simultaneously, and (3) lacking attention to spatial-temporal information at different time lengths. In this paper, we propose a Multi-Scale Spatial-Temporal Recurrent Network for traffic flow prediction, namely MSSTRN, which consists of two different recurrent neural networks: the single-step gate recurrent unit and the multi-step gate recurrent unit to fully capture the complex spatial-temporal information in the traffic data under different time windows. Moreover, we propose a spatial-temporal synchronous attention mechanism that integrates adaptive position graph convolutions into the self-attention mechanism to achieve synchronous capture of spatial-temporal dependencies. We conducted extensive experiments on four real traffic datasets and demonstrated that our model achieves the best prediction accuracy with non-trivial margins compared to all the twenty baseline methods.  ( 2 min )
    Robust 1-bit Compressed Sensing with Iterative Hard Thresholding. (arXiv:2310.08019v1 [cs.IT])
    In 1-bit compressed sensing, the aim is to estimate a $k$-sparse unit vector $x\in S^{n-1}$ within an $\epsilon$ error (in $\ell_2$) from minimal number of linear measurements that are quantized to just their signs, i.e., from measurements of the form $y = \mathrm{Sign}(\langle a, x\rangle).$ In this paper, we study a noisy version where a fraction of the measurements can be flipped, potentially by an adversary. In particular, we analyze the Binary Iterative Hard Thresholding (BIHT) algorithm, a proximal gradient descent on a properly defined loss function used for 1-bit compressed sensing, in this noisy setting. It is known from recent results that, with $\tilde{O}(\frac{k}{\epsilon})$ noiseless measurements, BIHT provides an estimate within $\epsilon$ error. This result is optimal and universal, meaning one set of measurements work for all sparse vectors. In this paper, we show that BIHT also provides better results than all known methods for the noisy setting. We show that when up to $\tau$-fraction of the sign measurements are incorrect (adversarial error), with the same number of measurements as before, BIHT agnostically provides an estimate of $x$ within an $\tilde{O}(\epsilon+\tau)$ error, maintaining the universality of measurements. This establishes stability of iterative hard thresholding in the presence of measurement error. To obtain the result, we use the restricted approximate invertibility of Gaussian matrices, as well as a tight analysis of the high-dimensional geometry of the adversarially corrupted measurements.  ( 3 min )
    Relaxing the Additivity Constraints in Decentralized No-Regret High-Dimensional Bayesian Optimization. (arXiv:2305.19838v2 [cs.LG] UPDATED)
    Bayesian Optimization (BO) is typically used to optimize an unknown function $f$ that is noisy and costly to evaluate, by exploiting an acquisition function that must be maximized at each optimization step. Even if provably asymptotically optimal BO algorithms are efficient at optimizing low-dimensional functions, scaling them to high-dimensional spaces remains an open problem, often tackled by assuming an additive structure for $f$. By doing so, BO algorithms typically introduce additional restrictive assumptions on the additive structure that reduce their applicability domain. This paper contains two main contributions: (i) we relax the restrictive assumptions on the additive structure of $f$, at the expense of weakening the maximization guarantees of the acquisition function, and (ii) we address the over-exploration problem for decentralized BO algorithms. To these ends, we propose DumBO, an asymptotically optimal decentralized BO algorithm that achieves very competitive performance against state-of-the-art BO algorithms, especially when the additive structure of $f$ comprises high-dimensional factors.  ( 2 min )
    Counterfactual Explanations for Time Series Forecasting. (arXiv:2310.08137v1 [cs.LG])
    Among recent developments in time series forecasting methods, deep forecasting models have gained popularity as they can utilize hidden feature patterns in time series to improve forecasting performance. Nevertheless, the majority of current deep forecasting models are opaque, hence making it challenging to interpret the results. While counterfactual explanations have been extensively employed as a post-hoc approach for explaining classification models, their application to forecasting models still remains underexplored. In this paper, we formulate the novel problem of counterfactual generation for time series forecasting, and propose an algorithm, called ForecastCF, that solves the problem by applying gradient-based perturbations to the original time series. ForecastCF guides the perturbations by applying constraints to the forecasted values to obtain desired prediction outcomes. We experimentally evaluate ForecastCF using four state-of-the-art deep model architectures and compare to two baselines. Our results show that ForecastCF outperforms the baseline in terms of counterfactual validity and data manifold closeness. Overall, our findings suggest that ForecastCF can generate meaningful and relevant counterfactual explanations for various forecasting tasks.  ( 2 min )
    Language Models As Semantic Indexers. (arXiv:2310.07815v1 [cs.IR])
    Semantic identifier (ID) is an important concept in information retrieval that aims to preserve the semantics of objects such as documents and items inside their IDs. Previous studies typically adopt a two-stage pipeline to learn semantic IDs by first procuring embeddings using off-the-shelf text encoders and then deriving IDs based on the embeddings. However, each step introduces potential information loss and there is usually an inherent mismatch between the distribution of embeddings within the latent space produced by text encoders and the anticipated distribution required for semantic indexing. Nevertheless, it is non-trivial to design a method that can learn the document's semantic representations and its hierarchical structure simultaneously, given that semantic IDs are discrete and sequentially structured, and the semantic supervision is deficient. In this paper, we introduce LMINDEXER, a self-supervised framework to learn semantic IDs with a generative language model. We tackle the challenge of sequential discrete ID by introducing a semantic indexer capable of generating neural sequential discrete representations with progressive training and contrastive learning. In response to the semantic supervision deficiency, we propose to train the model with a self-supervised document reconstruction objective. The learned semantic indexer can facilitate various downstream tasks, such as recommendation and retrieval. We conduct experiments on three tasks including recommendation, product search, and document retrieval on five datasets from various domains, where LMINDEXER outperforms competitive baselines significantly and consistently.
    Federated Learning from Small Datasets. (arXiv:2110.03469v3 [cs.LG] UPDATED)
    Federated learning allows multiple parties to collaboratively train a joint model without sharing local data. This enables applications of machine learning in settings of inherently distributed, undisclosable data such as in the medical domain. In practice, joint training is usually achieved by aggregating local models, for which local training objectives have to be in expectation similar to the joint (global) objective. Often, however, local datasets are so small that local objectives differ greatly from the global objective, resulting in federated learning to fail. We propose a novel approach that intertwines model aggregations with permutations of local models. The permutations expose each local model to a daisy chain of local datasets resulting in more efficient training in data-sparse domains. This enables training on extremely small local datasets, such as patient data across hospitals, while retaining the training efficiency and privacy benefits of federated learning.
    Interpretable Diffusion via Information Decomposition. (arXiv:2310.07972v1 [cs.LG])
    Denoising diffusion models enable conditional generation and density modeling of complex relationships like images and text. However, the nature of the learned relationships is opaque making it difficult to understand precisely what relationships between words and parts of an image are captured, or to predict the effect of an intervention. We illuminate the fine-grained relationships learned by diffusion models by noticing a precise relationship between diffusion and information decomposition. Exact expressions for mutual information and conditional mutual information can be written in terms of the denoising model. Furthermore, pointwise estimates can be easily estimated as well, allowing us to ask questions about the relationships between specific images and captions. Decomposing information even further to understand which variables in a high-dimensional space carry information is a long-standing problem. For diffusion models, we show that a natural non-negative decomposition of mutual information emerges, allowing us to quantify informative relationships between words and pixels in an image. We exploit these new relations to measure the compositional understanding of diffusion models, to do unsupervised localization of objects in images, and to measure effects when selectively editing images through prompt interventions.
    ClimateBERT-NetZero: Detecting and Assessing Net Zero and Reduction Targets. (arXiv:2310.08096v1 [cs.LG])
    Public and private actors struggle to assess the vast amounts of information about sustainability commitments made by various institutions. To address this problem, we create a novel tool for automatically detecting corporate, national, and regional net zero and reduction targets in three steps. First, we introduce an expert-annotated data set with 3.5K text samples. Second, we train and release ClimateBERT-NetZero, a natural language classifier to detect whether a text contains a net zero or reduction target. Third, we showcase its analysis potential with two use cases: We first demonstrate how ClimateBERT-NetZero can be combined with conventional question-answering (Q&A) models to analyze the ambitions displayed in net zero and reduction targets. Furthermore, we employ the ClimateBERT-NetZero model on quarterly earning call transcripts and outline how communication patterns evolve over time. Our experiments demonstrate promising pathways for extracting and analyzing net zero and emission reduction targets at scale.
    ZEST: Attention-based Zero-Shot Learning for Unseen IoT Device Classification. (arXiv:2310.08036v1 [cs.NI])
    Recent research works have proposed machine learning models for classifying IoT devices connected to a network. However, there is still a practical challenge of not having all devices (and hence their traffic) available during the training of a model. This essentially means, during the operational phase, we need to classify new devices not seen during the training phase. To address this challenge, we propose ZEST -- a ZSL (zero-shot learning) framework based on self-attention for classifying both seen and unseen devices. ZEST consists of i) a self-attention based network feature extractor, termed SANE, for extracting latent space representations of IoT traffic, ii) a generative model that trains a decoder using latent features to generate pseudo data, and iii) a supervised model that is trained on the generated pseudo data for classifying devices. We carry out extensive experiments on real IoT traffic data; our experiments demonstrate i) ZEST achieves significant improvement (in terms of accuracy) over the baselines; ii) ZEST is able to better extract meaningful representations than LSTM which has been commonly used for modeling network traffic.  ( 2 min )
    CrIBo: Self-Supervised Learning via Cross-Image Object-Level Bootstrapping. (arXiv:2310.07855v1 [cs.CV])
    Leveraging nearest neighbor retrieval for self-supervised representation learning has proven beneficial with object-centric images. However, this approach faces limitations when applied to scene-centric datasets, where multiple objects within an image are only implicitly captured in the global representation. Such global bootstrapping can lead to undesirable entanglement of object representations. Furthermore, even object-centric datasets stand to benefit from a finer-grained bootstrapping approach. In response to these challenges, we introduce a novel Cross-Image Object-Level Bootstrapping method tailored to enhance dense visual representation learning. By employing object-level nearest neighbor bootstrapping throughout the training, CrIBo emerges as a notably strong and adequate candidate for in-context learning, leveraging nearest neighbor retrieval at test time. CrIBo shows state-of-the-art performance on the latter task while being highly competitive in more standard downstream segmentation tasks. Our code and pretrained models will be publicly available upon acceptance.  ( 2 min )
    CleftGAN: Adapting A Style-Based Generative Adversarial Network To Create Images Depicting Cleft Lip Deformity. (arXiv:2310.07969v1 [cs.CV])
    A major obstacle when attempting to train a machine learning system to evaluate facial clefts is the scarcity of large datasets of high-quality, ethics board-approved patient images. In response, we have built a deep learning-based cleft lip generator designed to produce an almost unlimited number of artificial images exhibiting high-fidelity facsimiles of cleft lip with wide variation. We undertook a transfer learning protocol testing different versions of StyleGAN-ADA (a generative adversarial network image generator incorporating adaptive data augmentation (ADA)) as the base model. Training images depicting a variety of cleft deformities were pre-processed to adjust for rotation, scaling, color adjustment and background blurring. The ADA modification of the primary algorithm permitted construction of our new generative model while requiring input of a relatively small number of training images. Adversarial training was carried out using 514 unique frontal photographs of cleft-affected faces to adapt a pre-trained model based on 70,000 normal faces. The Frechet Inception Distance (FID) was used to measure the similarity of the newly generated facial images to the cleft training dataset, while Perceptual Path Length (PPL) and the novel Divergence Index of Severity Histograms (DISH) measures were also used to assess the performance of the image generator that we dub CleftGAN. We found that StyleGAN3 with translation invariance (StyleGAN3-t) performed optimally as a base model. Generated images achieved a low FID reflecting a close similarity to our training input dataset of genuine cleft images. Low PPL and DISH measures reflected a smooth and semantically valid interpolation of images through the transfer learning process and a similar distribution of severity in the training and generated images, respectively.  ( 3 min )
    Beyond Traditional DoE: Deep Reinforcement Learning for Optimizing Experiments in Model Identification of Battery Dynamics. (arXiv:2310.08198v1 [cs.LG])
    Model identification of battery dynamics is a central problem in energy research; many energy management systems and design processes rely on accurate battery models for efficiency optimization. The standard methodology for battery modelling is traditional design of experiments (DoE), where the battery dynamics are excited with many different current profiles and the measured outputs are used to estimate the system dynamics. However, although it is possible to obtain useful models with the traditional approach, the process is time consuming and expensive because of the need to sweep many different current-profile configurations. In the present work, a novel DoE approach is developed based on deep reinforcement learning, which alters the configuration of the experiments on the fly based on the statistics of past experiments. Instead of sticking to a library of predefined current profiles, the proposed approach modifies the current profiles dynamically by updating the output space covered by past measurements, hence only the current profiles that are informative for future experiments are applied. Simulations and real experiments are used to show that the proposed approach gives models that are as accurate as those obtained with traditional DoE but by using 85\% less resources.  ( 2 min )
    Cost-Driven Hardware-Software Co-Optimization of Machine Learning Pipelines. (arXiv:2310.07940v1 [cs.LG])
    Researchers have long touted a vision of the future enabled by a proliferation of internet-of-things devices, including smart sensors, homes, and cities. Increasingly, embedding intelligence in such devices involves the use of deep neural networks. However, their storage and processing requirements make them prohibitive for cheap, off-the-shelf platforms. Overcoming those requirements is necessary for enabling widely-applicable smart devices. While many ways of making models smaller and more efficient have been developed, there is a lack of understanding of which ones are best suited for particular scenarios. More importantly for edge platforms, those choices cannot be analyzed in isolation from cost and user experience. In this work, we holistically explore how quantization, model scaling, and multi-modality interact with system components such as memory, sensors, and processors. We perform this hardware/software co-design from the cost, latency, and user-experience perspective, and develop a set of guidelines for optimal system design and model deployment for the most cost-constrained platforms. We demonstrate our approach using an end-to-end, on-device, biometric user authentication system using a $20 ESP-EYE board.  ( 2 min )
    CHIP: Contrastive Hierarchical Image Pretraining. (arXiv:2310.08304v1 [cs.CV])
    Few-shot object classification is the task of classifying objects in an image with limited number of examples as supervision. We propose a one-shot/few-shot classification model that can classify an object of any unseen class into a relatively general category in an hierarchically based classification. Our model uses a three-level hierarchical contrastive loss based ResNet152 classifier for classifying an object based on its features extracted from Image embedding, not used during the training phase. For our experimentation, we have used a subset of the ImageNet (ILSVRC-12) dataset that contains only the animal classes for training our model and created our own dataset of unseen classes for evaluating our trained model. Our model provides satisfactory results in classifying the unknown objects into a generic category which has been later discussed in greater detail.  ( 2 min )
    On the Computational Complexity of Private High-dimensional Model Selection via the Exponential Mechanism. (arXiv:2310.07852v1 [stat.ML])
    We consider the problem of model selection in a high-dimensional sparse linear regression model under the differential privacy framework. In particular, we consider the problem of differentially private best subset selection and study its utility guarantee. We adopt the well-known exponential mechanism for selecting the best model, and under a certain margin condition, we establish its strong model recovery property. However, the exponential search space of the exponential mechanism poses a serious computational bottleneck. To overcome this challenge, we propose a Metropolis-Hastings algorithm for the sampling step and establish its polynomial mixing time to its stationary distribution in the problem parameters $n,p$, and $s$. Furthermore, we also establish approximate differential privacy for the final estimates of the Metropolis-Hastings random walk using its mixing property. Finally, we also perform some illustrative simulations that echo the theoretical findings of our main results.  ( 2 min )
    The Expresssive Power of Transformers with Chain of Thought. (arXiv:2310.07923v1 [cs.LG])
    Recent theoretical work has identified surprisingly simple reasoning problems, such as checking if two nodes in a graph are connected or simulating finite-state machines, that are provably unsolvable by standard transformers that answer immediately after reading their input. However, in practice, transformers' reasoning can be improved by allowing them to use a "chain of thought" or "scratchpad", i.e., generate and condition on a sequence of intermediate tokens before answering. Motivated by this, we ask: Does such intermediate generation fundamentally extend the computational power of a decoder-only transformer? We show that the answer is yes, but the amount of increase depends crucially on the amount of intermediate generation. For instance, we find that transformer decoders with a logarithmic number of decoding steps (w.r.t. the input length) push the limits of standard transformers only slightly, while a linear number of decoding steps adds a clear new ability (under standard complexity conjectures): recognizing all regular languages. Our results also imply that linear steps keep transformer decoders within context-sensitive languages, and polynomial steps make them recognize exactly the class of polynomial-time solvable problems -- the first exact characterization of a type of transformers in terms of standard complexity classes. Together, our results provide a nuanced framework for understanding how the length of a transformer's chain of thought or scratchpad impacts its reasoning power.  ( 2 min )
    Emulating the dynamics of complex systems using autoregressive models on manifolds (mNARX). (arXiv:2306.16335v2 [stat.CO] UPDATED)
    We propose a novel surrogate modelling approach to efficiently and accurately approximate the response of complex dynamical systems driven by time-varying exogenous excitations over extended time periods. Our approach, namely manifold nonlinear autoregressive modelling with exogenous input (mNARX), involves constructing a problem-specific exogenous input manifold that is optimal for constructing autoregressive surrogates. The manifold, which forms the core of mNARX, is constructed incrementally by incorporating the physics of the system, as well as prior expert- and domain- knowledge. Because mNARX decomposes the full problem into a series of smaller sub-problems, each with a lower complexity than the original, it scales well with the complexity of the problem, both in terms of training and evaluation costs of the final surrogate. Furthermore, mNARX synergizes well with traditional dimensionality reduction techniques, making it highly suitable for modelling dynamical systems with high-dimensional exogenous inputs, a class of problems that is typically challenging to solve. Since domain knowledge is particularly abundant in physical systems, such as those found in civil and mechanical engineering, mNARX is well suited for these applications. We demonstrate that mNARX outperforms traditional autoregressive surrogates in predicting the response of a classical coupled spring-mass system excited by a one-dimensional random excitation. Additionally, we show that mNARX is well suited for emulating very high-dimensional time- and state-dependent systems, even when affected by active controllers, by surrogating the dynamics of a realistic aero-servo-elastic onshore wind turbine simulator. In general, our results demonstrate that mNARX offers promising prospects for modelling complex dynamical systems, in terms of accuracy and efficiency.  ( 3 min )
    Leader-Follower Neural Networks with Local Error Signals Inspired by Complex Collectives. (arXiv:2310.07885v1 [cs.LG])
    The collective behavior of a network with heterogeneous, resource-limited information processing units (e.g., group of fish, flock of birds, or network of neurons) demonstrates high self-organization and complexity. These emergent properties arise from simple interaction rules where certain individuals can exhibit leadership-like behavior and influence the collective activity of the group. Motivated by the intricacy of these collectives, we propose a neural network (NN) architecture inspired by the rules observed in nature's collective ensembles. This NN structure contains workers that encompass one or more information processing units (e.g., neurons, filters, layers, or blocks of layers). Workers are either leaders or followers, and we train a leader-follower neural network (LFNN) by leveraging local error signals and optionally incorporating backpropagation (BP) and global loss. We investigate worker behavior and evaluate LFNNs through extensive experimentation. Our LFNNs trained with local error signals achieve significantly lower error rates than previous BP-free algorithms on MNIST and CIFAR-10 and even surpass BP-enabled baselines. In the case of ImageNet, our LFNN-l demonstrates superior scalability and outperforms previous BP-free algorithms by a significant margin.  ( 2 min )
    Towards Causal Deep Learning for Vulnerability Detection. (arXiv:2310.07958v1 [cs.SE])
    Deep learning vulnerability detection has shown promising results in recent years. However, an important challenge that still blocks it from being very useful in practice is that the model is not robust under perturbation and it cannot generalize well over the out-of-distribution (OOD) data, e.g., applying a trained model to unseen projects in real world. We hypothesize that this is because the model learned non-robust features, e.g., variable names, that have spurious correlations with labels. When the perturbed and OOD datasets no longer have the same spurious features, the model prediction fails. To address the challenge, in this paper, we introduced causality into deep learning vulnerability detection. Our approach CausalVul consists of two phases. First, we designed novel perturbations to discover spurious features that the model may use to make predictions. Second, we applied the causal learning algorithms, specifically, do-calculus, on top of existing deep learning models to systematically remove the use of spurious features and thus promote causal based prediction. Our results show that CausalVul consistently improved the model accuracy, robustness and OOD performance for all the state-of-the-art models and datasets we experimented. To the best of our knowledge, this is the first work that introduces do calculus based causal learning to software engineering models and shows it's indeed useful for improving the model accuracy, robustness and generalization. Our replication package is located at https://figshare.com/s/0ffda320dcb96c249ef2.  ( 2 min )
    D2 Pruning: Message Passing for Balancing Diversity and Difficulty in Data Pruning. (arXiv:2310.07931v1 [cs.LG])
    Analytical theories suggest that higher-quality data can lead to lower test errors in models trained on a fixed data budget. Moreover, a model can be trained on a lower compute budget without compromising performance if a dataset can be stripped of its redundancies. Coreset selection (or data pruning) seeks to select a subset of the training data so as to maximize the performance of models trained on this subset, also referred to as coreset. There are two dominant approaches: (1) geometry-based data selection for maximizing data diversity in the coreset, and (2) functions that assign difficulty scores to samples based on training dynamics. Optimizing for data diversity leads to a coreset that is biased towards easier samples, whereas, selection by difficulty ranking omits easy samples that are necessary for the training of deep learning models. This demonstrates that data diversity and importance scores are two complementary factors that need to be jointly considered during coreset selection. We represent a dataset as an undirected graph and propose a novel pruning algorithm, D2 Pruning, that uses forward and reverse message passing over this dataset graph for coreset selection. D2 Pruning updates the difficulty scores of each example by incorporating the difficulty of its neighboring examples in the dataset graph. Then, these updated difficulty scores direct a graph-based sampling method to select a coreset that encapsulates both diverse and difficult regions of the dataset space. We evaluate supervised and self-supervised versions of our method on various vision and language datasets. Results show that D2 Pruning improves coreset selection over previous state-of-the-art methods for up to 70% pruning rates. Additionally, we find that using D2 Pruning for filtering large multimodal datasets leads to increased diversity in the dataset and improved generalization of pretrained models.  ( 3 min )
    Promoting Robustness of Randomized Smoothing: Two Cost-Effective Approaches. (arXiv:2310.07780v1 [cs.LG])
    Randomized smoothing has recently attracted attentions in the field of adversarial robustness to provide provable robustness guarantees on smoothed neural network classifiers. However, existing works show that vanilla randomized smoothing usually does not provide good robustness performance and often requires (re)training techniques on the base classifier in order to boost the robustness of the resulting smoothed classifier. In this work, we propose two cost-effective approaches to boost the robustness of randomized smoothing while preserving its clean performance. The first approach introduces a new robust training method AdvMacerwhich combines adversarial training and robustness certification maximization for randomized smoothing. We show that AdvMacer can improve the robustness performance of randomized smoothing classifiers compared to SOTA baselines, while being 3x faster to train than MACER baseline. The second approach introduces a post-processing method EsbRS which greatly improves the robustness certificate based on building model ensembles. We explore different aspects of model ensembles that has not been studied by prior works and propose a novel design methodology to further improve robustness of the ensemble based on our theoretical analysis.  ( 2 min )
    First-Order Dynamic Optimization for Streaming Convex Costs. (arXiv:2310.07925v1 [math.OC])
    This paper proposes a set of novel optimization algorithms for solving a class of convex optimization problems with time-varying streaming cost function. We develop an approach to track the optimal solution with a bounded error. Unlike the existing results, our algorithm is executed only by using the first-order derivatives of the cost function which makes it computationally efficient for optimization with time-varying cost function. We compare our algorithms to the gradient descent algorithm and show why gradient descent is not an effective solution for optimization problems with time-varying cost. Several examples including solving a model predictive control problem cast as a convex optimization problem with a streaming time-varying cost function demonstrate our results.  ( 2 min )
    Local Graph Clustering with Noisy Labels. (arXiv:2310.08031v1 [cs.LG])
    The growing interest in machine learning problems over graphs with additional node information such as texts, images, or labels has popularized methods that require the costly operation of processing the entire graph. Yet, little effort has been made to the development of fast local methods (i.e. without accessing the entire graph) that extract useful information from such data. To that end, we propose a study of local graph clustering using noisy node labels as a proxy for additional node information. In this setting, nodes receive initial binary labels based on cluster affiliation: 1 if they belong to the target cluster and 0 otherwise. Subsequently, a fraction of these labels is flipped. We investigate the benefits of incorporating noisy labels for local graph clustering. By constructing a weighted graph with such labels, we study the performance of graph diffusion-based local clustering method on both the original and the weighted graphs. From a theoretical perspective, we consider recovering an unknown target cluster with a single seed node in a random graph with independent noisy node labels. We provide sufficient conditions on the label noise under which, with high probability, using diffusion in the weighted graph yields a more accurate recovery of the target cluster. This approach proves more effective than using the given labels alone or using diffusion in the label-free original graph. Empirically, we show that reliable node labels can be obtained with just a few samples from an attributed graph. Moreover, utilizing these labels via diffusion in the weighted graph leads to significantly better local clustering performance across several real-world datasets, improving F1 scores by up to 13%.  ( 3 min )
    A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback. (arXiv:2301.13326v2 [cs.LG] UPDATED)
    We investigate the problem of stochastic, combinatorial multi-armed bandits where the learner only has access to bandit feedback and the reward function can be non-linear. We provide a general framework for adapting discrete offline approximation algorithms into sublinear $\alpha$-regret methods that only require bandit feedback, achieving $\mathcal{O}\left(T^\frac{2}{3}\log(T)^\frac{1}{3}\right)$ expected cumulative $\alpha$-regret dependence on the horizon $T$. The framework only requires the offline algorithms to be robust to small errors in function evaluation. The adaptation procedure does not even require explicit knowledge of the offline approximation algorithm -- the offline algorithm can be used as a black box subroutine. To demonstrate the utility of the proposed framework, the proposed framework is applied to diverse applications in submodular maximization. The new CMAB algorithms for submodular maximization with knapsack constraints outperform a full-bandit method developed for the adversarial setting in experiments with real-world data.  ( 3 min )
    Learning to Simulate Tree-Branch Dynamics for Manipulation. (arXiv:2306.03410v2 [cs.RO] UPDATED)
    We propose to use a simulation driven inverse inference approach to model the dynamics of tree branches under manipulation. Learning branch dynamics and gaining the ability to manipulate deformable vegetation can help with occlusion-prone tasks, such as fruit picking in dense foliage, as well as moving overhanging vines and branches for navigation in dense vegetation. The underlying deformable tree geometry is encapsulated as coarse spring abstractions executed on parallel, non-differentiable simulators. The implicit statistical model defined by the simulator, reference trajectories obtained by actively probing the ground truth, and the Bayesian formalism, together guide the spring parameter posterior density estimation. Our non-parametric inference algorithm, based on Stein Variational Gradient Descent, incorporates biologically motivated assumptions into the inference process as neural network driven learnt joint priors; moreover, it leverages the finite difference scheme for gradient approximations. Real and simulated experiments confirm that our model can predict deformation trajectories, quantify the estimation uncertainty, and it can perform better when base-lined against other inference algorithms, particularly from the Monte Carlo family. The model displays strong robustness properties in the presence of heteroscedastic sensor noise; furthermore, it can generalise to unseen grasp locations.  ( 2 min )
    A Transfer-Learning-Based Prognosis Prediction Paradigm that Bridges Data Distribution Shift across EMR Datasets. (arXiv:2310.07799v1 [cs.LG])
    Due to the limited information about emerging diseases, symptoms are hard to be noticed and recognized, so that the window for clinical intervention could be ignored. An effective prognostic model is expected to assist doctors in making right diagnosis and designing personalized treatment plan, so to promptly prevent unfavorable outcomes. However, in the early stage of a disease, limited data collection and clinical experiences, plus the concern out of privacy and ethics, may result in restricted data availability for reference, to the extent that even data labels are difficult to mark correctly. In addition, Electronic Medical Record (EMR) data of different diseases or of different sources of the same disease can prove to be having serious cross-dataset feature misalignment problems, greatly mutilating the efficiency of deep learning models. This article introduces a transfer learning method to build a transition model from source dataset to target dataset. By way of constraining the distribution shift of features generated in disparate domains, domain-invariant features that are exclusively relative to downstream tasks are captured, so to cultivate a unified domain-invariant encoder across various task domains to achieve better feature representation. Experimental results of several target tasks demonstrate that our proposed model outperforms competing baseline methods and has higher rate of training convergence, especially in dealing with limited data amount. A multitude of experiences have proven the efficacy of our method to provide more accurate predictions concerning newly emergent pandemics and other diseases.  ( 3 min )
    Elastic Decision Transformer. (arXiv:2307.02484v5 [cs.LG] UPDATED)
    This paper introduces Elastic Decision Transformer (EDT), a significant advancement over the existing Decision Transformer (DT) and its variants. Although DT purports to generate an optimal trajectory, empirical evidence suggests it struggles with trajectory stitching, a process involving the generation of an optimal or near-optimal trajectory from the best parts of a set of sub-optimal trajectories. The proposed EDT differentiates itself by facilitating trajectory stitching during action inference at test time, achieved by adjusting the history length maintained in DT. Further, the EDT optimizes the trajectory by retaining a longer history when the previous trajectory is optimal and a shorter one when it is sub-optimal, enabling it to "stitch" with a more optimal trajectory. Extensive experimentation demonstrates EDT's ability to bridge the performance gap between DT-based and Q Learning-based approaches. In particular, the EDT outperforms Q Learning-based methods in a multi-task regime on the D4RL locomotion benchmark and Atari games. Videos are available at: https://kristery.github.io/edt/  ( 2 min )
    Limits of Model Selection under Transfer Learning. (arXiv:2305.00152v4 [stat.ML] UPDATED)
    Theoretical studies on transfer learning or domain adaptation have so far focused on situations with a known hypothesis class or model; however in practice, some amount of model selection is usually involved, often appearing under the umbrella term of hyperparameter-tuning: for example, one may think of the problem of tuning for the right neural network architecture towards a target task, while leveraging data from a related source task. Now, in addition to the usual tradeoffs on approximation vs estimation errors involved in model selection, this problem brings in a new complexity term, namely, the transfer distance between source and target distributions, which is known to vary with the choice of hypothesis class. We present a first study of this problem, focusing on classification; in particular, the analysis reveals some remarkable phenomena: adaptive rates, i.e., those achievable with no distributional information, can be arbitrarily slower than oracle rates, i.e., when given knowledge on distances.  ( 2 min )
    GROOT: Learning to Follow Instructions by Watching Gameplay Videos. (arXiv:2310.08235v1 [cs.AI])
    We study the problem of building a controller that can follow open-ended instructions in open-world environments. We propose to follow reference videos as instructions, which offer expressive goal specifications while eliminating the need for expensive text-gameplay annotations. A new learning framework is derived to allow learning such instruction-following controllers from gameplay videos while producing a video instruction encoder that induces a structured goal space. We implement our agent GROOT in a simple yet effective encoder-decoder architecture based on causal transformers. We evaluate GROOT against open-world counterparts and human players on a proposed Minecraft SkillForge benchmark. The Elo ratings clearly show that GROOT is closing the human-machine gap as well as exhibiting a 70% winning rate over the best generalist agent baseline. Qualitative analysis of the induced goal space further demonstrates some interesting emergent properties, including the goal composition and complex gameplay behavior synthesis. Code and video can be found on the website https://craftjarvis-groot.github.io.  ( 2 min )
    XIMAGENET-12: An Explainable AI Benchmark Dataset for Model Robustness Evaluation. (arXiv:2310.08182v1 [cs.CV])
    The lack of standardized robustness metrics and the widespread reliance on numerous unrelated benchmark datasets for testing have created a gap between academically validated robust models and their often problematic practical adoption. To address this, we introduce XIMAGENET-12, an explainable benchmark dataset with over 200K images and 15,600 manual semantic annotations. Covering 12 categories from ImageNet to represent objects commonly encountered in practical life and simulating six diverse scenarios, including overexposure, blurring, color changing, etc., we further propose a novel robustness criterion that extends beyond model generation ability assessment. This benchmark dataset, along with related code, is available at https://sites.google.com/view/ximagenet-12/home. Researchers and practitioners can leverage this resource to evaluate the robustness of their visual models under challenging conditions and ultimately benefit from the demands of practical computer vision systems.  ( 2 min )
    Samples on Thin Ice: Re-Evaluating Adversarial Pruning of Neural Networks. (arXiv:2310.08073v1 [cs.LG])
    Neural network pruning has shown to be an effective technique for reducing the network size, trading desirable properties like generalization and robustness to adversarial attacks for higher sparsity. Recent work has claimed that adversarial pruning methods can produce sparse networks while also preserving robustness to adversarial examples. In this work, we first re-evaluate three state-of-the-art adversarial pruning methods, showing that their robustness was indeed overestimated. We then compare pruned and dense versions of the same models, discovering that samples on thin ice, i.e., closer to the unpruned model's decision boundary, are typically misclassified after pruning. We conclude by discussing how this intuition may lead to designing more effective adversarial pruning methods in future work.  ( 2 min )
    Data driven modeling of self-similar dynamics. (arXiv:2310.08282v1 [cs.LG])
    Multiscale modeling of complex systems is crucial for understanding their intricacies. Data-driven multiscale modeling has emerged as a promising approach to tackle challenges associated with complex systems. On the other hand, self-similarity is prevalent in complex systems, hinting that large-scale complex systems can be modeled at a reduced cost. In this paper, we introduce a multiscale neural network framework that incorporates self-similarity as prior knowledge, facilitating the modeling of self-similar dynamical systems. For deterministic dynamics, our framework can discern whether the dynamics are self-similar. For uncertain dynamics, it can compare and determine which parameter set is closer to self-similarity. The framework allows us to extract scale-invariant kernels from the dynamics for modeling at any scale. Moreover, our method can identify the power law exponents in self-similar systems. Preliminary tests on the Ising model yielded critical exponents consistent with theoretical expectations, providing valuable insights for addressing critical phase transitions in non-equilibrium systems.  ( 2 min )
    To token or not to token: A Comparative Study of Text Representations for Cross-Lingual Transfer. (arXiv:2310.08078v1 [cs.CL])
    Choosing an appropriate tokenization scheme is often a bottleneck in low-resource cross-lingual transfer. To understand the downstream implications of text representation choices, we perform a comparative analysis on language models having diverse text representation modalities including 2 segmentation-based models (\texttt{BERT}, \texttt{mBERT}), 1 image-based model (\texttt{PIXEL}), and 1 character-level model (\texttt{CANINE}). First, we propose a scoring Language Quotient (LQ) metric capable of providing a weighted representation of both zero-shot and few-shot evaluation combined. Utilizing this metric, we perform experiments comprising 19 source languages and 133 target languages on three tasks (POS tagging, Dependency parsing, and NER). Our analysis reveals that image-based models excel in cross-lingual transfer when languages are closely related and share visually similar scripts. However, for tasks biased toward word meaning (POS, NER), segmentation-based models prove to be superior. Furthermore, in dependency parsing tasks where word relationships play a crucial role, models with their character-level focus, outperform others. Finally, we propose a recommendation scheme based on our findings to guide model selection according to task and language requirements.  ( 2 min )
    Learning from Label Proportions: Bootstrapping Supervised Learners via Belief Propagation. (arXiv:2310.08056v1 [cs.LG])
    Learning from Label Proportions (LLP) is a learning problem where only aggregate level labels are available for groups of instances, called bags, during training, and the aim is to get the best performance at the instance-level on the test data. This setting arises in domains like advertising and medicine due to privacy considerations. We propose a novel algorithmic framework for this problem that iteratively performs two main steps. For the first step (Pseudo Labeling) in every iteration, we define a Gibbs distribution over binary instance labels that incorporates a) covariate information through the constraint that instances with similar covariates should have similar labels and b) the bag level aggregated label. We then use Belief Propagation (BP) to marginalize the Gibbs distribution to obtain pseudo labels. In the second step (Embedding Refinement), we use the pseudo labels to provide supervision for a learner that yields a better embedding. Further, we iterate on the two steps again by using the second step's embeddings as new covariates for the next iteration. In the final iteration, a classifier is trained using the pseudo labels. Our algorithm displays strong gains against several SOTA baselines (up to 15%) for the LLP Binary Classification problem on various dataset types - tabular and Image. We achieve these improvements with minimal computational overhead above standard supervised learning due to Belief Propagation, for large bag sizes, even for a million samples.  ( 2 min )
    Overview of Physics-Informed Machine Learning Inversion of Geophysical Data. (arXiv:2310.08109v1 [physics.geo-ph])
    We review four types of algorithms for physics-informed machine learning (PIML) inversion of geophysical data. The unifying equation is given by the joint objective function $\epsilon$: \begin{eqnarray} \epsilon^{||-PIML}&=&\lambda_1 \overbrace{||{\bf W}^{ML}({\bf H}_{{\bf w}} {\bf d}^{obs}-{\bf m})||^2}^{NN} + \lambda_2 \overbrace{{||{\bf W}^{FWI}({\bf L} {\bf m}-{\bf d}^{obs})||^2}}^{FWI} ~+ \nonumber\\ \nonumber\\ && + ~~Regularizer, \label{PIML.eq120} \end{eqnarray}where the optimal model ${\bf m}^*$ and weights $\bf w^*$ minimize $\epsilon$. Here, The matrix weights are given by the boldface symbol $\bf W$, and full waveform inversion (FWI) is typically computed using a finite-difference solution of the wave equation, where $\bf L$ represents the forward modeling operation of the wave equation as a function of the model $\bf m$. Also, a fully-connected neural network (NN) is used to compute the model ${\bf H_w}{\bf d}^{obs} \approx \bf m$ from the observed input data ${\bf d}^{obs}$. The selection of weights $\lambda_i$ and the NN operations determine one of four different PIML algorithms. PIML offers potential advantages over standard FWI through its enhanced ability to avoid local minima and the option to locally train the inversion operator, minimizing the requirement for extensive training data for global applicability. However, the effectiveness of PIML relies on the similarity between the test and trained data. Nevertheless, a possible strategy to overcome this limitation involves initial pretraining of a PIML architecture with data from a broader region, followed by fine-tuning for specific data-a method reminiscent of the way large language models are pretrained and adapted for various tasks.  ( 2 min )
    LGL-BCI: A Lightweight Geometric Learning Framework for Motor Imagery-Based Brain-Computer Interfaces. (arXiv:2310.08051v1 [cs.LG])
    Brain-Computer Interfaces (BCIs) are a groundbreaking technology for interacting with external devices using brain signals. Despite advancements, electroencephalogram (EEG)-based Motor Imagery (MI) tasks face challenges like amplitude and phase variability, and complex spatial correlations, with a need for smaller model size and faster inference. This study introduces the LGL-BCI framework, employing a Geometric Deep Learning Framework for EEG processing in non-Euclidean metric spaces, particularly the Symmetric Positive Definite (SPD) Manifold space. LGL-BCI offers robust EEG data representation and captures spatial correlations. We propose an EEG channel selection solution via a feature decomposition algorithm to reduce SPD matrix dimensionality, with a lossless transformation boosting inference speed. Extensive experiments show LGL-BCI's superior accuracy and efficiency compared to current solutions, highlighting geometric deep learning's potential in MI-BCI applications. The efficiency, assessed on two public EEG datasets and two real-world EEG devices, significantly outperforms the state-of-the-art solution in accuracy ($82.54\%$ versus $62.22\%$) with fewer parameters (64.9M compared to 183.7M).  ( 2 min )
    SimCKP: Simple Contrastive Learning of Keyphrase Representations. (arXiv:2310.08221v1 [cs.CL])
    Keyphrase generation (KG) aims to generate a set of summarizing words or phrases given a source document, while keyphrase extraction (KE) aims to identify them from the text. Because the search space is much smaller in KE, it is often combined with KG to predict keyphrases that may or may not exist in the corresponding document. However, current unified approaches adopt sequence labeling and maximization-based generation that primarily operate at a token level, falling short in observing and scoring keyphrases as a whole. In this work, we propose SimCKP, a simple contrastive learning framework that consists of two stages: 1) An extractor-generator that extracts keyphrases by learning context-aware phrase-level representations in a contrastive manner while also generating keyphrases that do not appear in the document; 2) A reranker that adapts scores for each generated phrase by likewise aligning their representations with the corresponding document. Experimental results on multiple benchmark datasets demonstrate the effectiveness of our proposed approach, which outperforms the state-of-the-art models by a significant margin.  ( 2 min )
    Multi-SpacePhish: Extending the Evasion-space of Adversarial Attacks against Phishing Website Detectors using Machine Learning. (arXiv:2210.13660v3 [cs.CR] UPDATED)
    Existing literature on adversarial Machine Learning (ML) focuses either on showing attacks that break every ML model, or defenses that withstand most attacks. Unfortunately, little consideration is given to the actual feasibility of the attack or the defense. Moreover, adversarial samples are often crafted in the "feature-space", making the corresponding evaluations of questionable value. Simply put, the current situation does not allow to estimate the actual threat posed by adversarial attacks, leading to a lack of secure ML systems. We aim to clarify such confusion in this paper. By considering the application of ML for Phishing Website Detection (PWD), we formalize the "evasion-space" in which an adversarial perturbation can be introduced to fool a ML-PWD -- demonstrating that even perturbations in the "feature-space" are useful. Then, we propose a realistic threat model describing evasion attacks against ML-PWD that are cheap to stage, and hence intrinsically more attractive for real phishers. After that, we perform the first statistically validated assessment of state-of-the-art ML-PWD against 12 evasion attacks. Our evaluation shows (i) the true efficacy of evasion attempts that are more likely to occur; and (ii) the impact of perturbations crafted in different evasion-spaces. Our realistic evasion attempts induce a statistically significant degradation (3-10% at p<0.05), and their cheap cost makes them a subtle threat. Notably, however, some ML-PWD are immune to our most realistic attacks (p=0.22). Finally, as an additional contribution of this journal publication, we are the first to consider the intriguing case wherein an attacker introduces perturbations in multiple evasion-spaces at the same time. These new results show that simultaneously applying perturbations in the problem- and feature-space can cause a drop in the detection rate from 0.95 to 0.  ( 3 min )
    PRiSM: Enhancing Low-Resource Document-Level Relation Extraction with Relation-Aware Score Calibration. (arXiv:2309.13869v1 [cs.CL] CROSS LISTED)
    Document-level relation extraction (DocRE) aims to extract relations of all entity pairs in a document. A key challenge in DocRE is the cost of annotating such data which requires intensive human effort. Thus, we investigate the case of DocRE in a low-resource setting, and we find that existing models trained on low data overestimate the NA ("no relation") label, causing limited performance. In this work, we approach the problem from a calibration perspective and propose PRiSM, which learns to adapt logits based on relation semantic information. We evaluate our method on three DocRE datasets and demonstrate that integrating existing models with PRiSM improves performance by as much as 26.38 F1 score, while the calibration error drops as much as 36 times when trained with about 3% of data. The code is publicly available at https://github.com/brightjade/PRiSM.  ( 2 min )
    Model-Free, Regret-Optimal Best Policy Identification in Online CMDPs. (arXiv:2309.15395v2 [cs.LG] UPDATED)
    This paper considers the best policy identification (BPI) problem in online Constrained Markov Decision Processes (CMDPs). We are interested in algorithms that are model-free, have low regret, and identify an optimal policy with a high probability. Existing model-free algorithms for online CMDPs with sublinear regret and constraint violation do not provide any convergence guarantee to an optimal policy and provide only average performance guarantees when a policy is uniformly sampled at random from all previously used policies. In this paper, we develop a new algorithm, named Pruning-Refinement-Identification (PRI), based on a fundamental structural property of CMDPs we discover, called limited stochasticity. The property says for a CMDP with $N$ constraints, there exists an optimal policy with at most $N$ stochastic decisions. The proposed algorithm first identifies at which step and in which state a stochastic decision has to be taken and then fine-tunes the distributions of these stochastic decisions. PRI achieves trio objectives: (i) PRI is a model-free algorithm; and (ii) it outputs a near-optimal policy with a high probability at the end of learning; and (iii) in the tabular setting, PRI guarantees $\tilde{\mathcal{O}}(\sqrt{K})$ regret and constraint violation, which significantly improves the best existing regret bound $\tilde{\mathcal{O}}(K^{\frac{4}{5}})$ under a model-free algorithm, where $K$ is the total number of episodes.  ( 2 min )
    Rethinking the BERT-like Pretraining for DNA Sequences. (arXiv:2310.07644v2 [cs.AI] UPDATED)
    With the success of large-scale pretraining in NLP, there is an increasing trend of applying it to the domain of life sciences. In particular, pretraining methods based on DNA sequences have garnered growing attention due to their potential to capture generic information about genes. However, existing pretraining methods for DNA sequences largely rely on direct adoptions of BERT pretraining from NLP, lacking a comprehensive understanding and a specifically tailored approach. To address this research gap, we first conducted a series of exploratory experiments and gained several insightful observations: 1) In the fine-tuning phase of downstream tasks, when using K-mer overlapping tokenization instead of K-mer non-overlapping tokenization, both overlapping and non-overlapping pretraining weights show consistent performance improvement.2) During the pre-training process, using K-mer overlapping tokenization quickly produces clear K-mer embeddings and reduces the loss to a very low level, while using K-mer non-overlapping tokenization results in less distinct embeddings and continuously decreases the loss. 3) Using overlapping tokenization causes the self-attention in the intermediate layers of pre-trained models to tend to overly focus on certain tokens, reflecting that these layers are not adequately optimized. In summary, overlapping tokenization can benefit the fine-tuning of downstream tasks but leads to inadequate pretraining with fast convergence. To unleash the pretraining potential, we introduce a novel approach called RandomMask, which gradually increases the task difficulty of BERT-like pretraining by continuously expanding its mask boundary, forcing the model to learn more knowledge. RandomMask is simple but effective, achieving top-tier performance across 26 datasets of 28 datasets spanning 7 downstream tasks.  ( 3 min )
    Memorization Capacity of Multi-Head Attention in Transformers. (arXiv:2306.02010v2 [cs.LG] UPDATED)
    Transformers have become the go-to architecture for language and vision tasks, yet their theoretical properties, especially memorization capacity, remain elusive. This paper investigates the memorization abilities of multi-head attention mechanisms, examining how many example sequences they can memorize, as a function of the number of heads and sequence length. Motivated by experimental findings on vision transformers, we introduce novel assumptions about the linear independence of input data, distinct from the commonly used general-position assumption. Under these assumptions, we demonstrate that an attention layer with $H$ heads, dimension $d$, and context size $n < d$, featuring $\Theta(Hd^2)$ parameters, can memorize $\Omega(Hn)$ examples. Our analysis sheds light on how different attention heads handle various example sequences, aided by the softmax operator's saturation property. We validate our findings through experiments on synthetic data.  ( 2 min )
    CRITERIA: a New Benchmarking Paradigm for Evaluating Trajectory Prediction Models for Autonomous Driving. (arXiv:2310.07794v1 [cs.CV])
    Benchmarking is a common method for evaluating trajectory prediction models for autonomous driving. Existing benchmarks rely on datasets, which are biased towards more common scenarios, such as cruising, and distance-based metrics that are computed by averaging over all scenarios. Following such a regiment provides a little insight into the properties of the models both in terms of how well they can handle different scenarios and how admissible and diverse their outputs are. There exist a number of complementary metrics designed to measure the admissibility and diversity of trajectories, however, they suffer from biases, such as length of trajectories. In this paper, we propose a new benChmarking paRadIgm for evaluaTing trajEctoRy predIction Approaches (CRITERIA). Particularly, we propose 1) a method for extracting driving scenarios at varying levels of specificity according to the structure of the roads, models' performance, and data properties for fine-grained ranking of prediction models; 2) A set of new bias-free metrics for measuring diversity, by incorporating the characteristics of a given scenario, and admissibility, by considering the structure of roads and kinematic compliancy, motivated by real-world driving constraints. 3) Using the proposed benchmark, we conduct extensive experimentation on a representative set of the prediction models using the large scale Argoverse dataset. We show that the proposed benchmark can produce a more accurate ranking of the models and serve as a means of characterizing their behavior. We further present ablation studies to highlight contributions of different elements that are used to compute the proposed metrics.  ( 3 min )
    Dynamic Subgoal-based Exploration via Bayesian Optimization. (arXiv:1910.09143v5 [math.OC] UPDATED)
    Reinforcement learning in sparse-reward navigation environments with expensive and limited interactions is challenging and poses a need for effective exploration. Motivated by complex navigation tasks that require real-world training (when cheap simulators are not available), we consider an agent that faces an unknown distribution of environments and must decide on an exploration strategy. It may leverage a series of training environments to improve its policy before it is evaluated in a test environment drawn from the same environment distribution. Most existing approaches focus on fixed exploration strategies, while the few that view exploration as a meta-optimization problem tend to ignore the need for cost-efficient exploration. We propose a cost-aware Bayesian optimization approach that efficiently searches over a class of dynamic subgoal-based exploration strategies. The algorithm adjusts a variety of levers -- the locations of the subgoals, the length of each episode, and the number of replications per trial -- in order to overcome the challenges of sparse rewards, expensive interactions, and noise. An experimental evaluation demonstrates that the new approach outperforms existing baselines across a number of problem domains. We also provide a theoretical foundation and prove that the method asymptotically identifies a near-optimal subgoal design.  ( 2 min )
    Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning. (arXiv:2301.10886v5 [cs.LG] UPDATED)
    We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and adaptively provides high-quality intrinsic rewards to enhance exploration in reinforcement learning (RL). More specifically, AIRS selects shaping function from a predefined set based on the estimated task return in real-time, providing reliable exploration incentives and alleviating the biased objective problem. Moreover, we develop an intrinsic reward toolkit to provide efficient and reliable implementations of diverse intrinsic reward approaches. We test AIRS on various tasks of MiniGrid, Procgen, and DeepMind Control Suite. Extensive simulation demonstrates that AIRS can outperform the benchmarking schemes and achieve superior performance with simple architecture.  ( 2 min )
    Refined Mechanism Design for Approximately Structured Priors via Active Regression. (arXiv:2310.07874v1 [cs.GT])
    We consider the problem of a revenue-maximizing seller with a large number of items $m$ for sale to $n$ strategic bidders, whose valuations are drawn independently from high-dimensional, unknown prior distributions. It is well-known that optimal and even approximately-optimal mechanisms for this setting are notoriously difficult to characterize or compute, and, even when they can be found, are often rife with various counter-intuitive properties. In this paper, following a model introduced recently by Cai and Daskalakis~\cite{cai2022recommender}, we consider the case that bidders' prior distributions can be well-approximated by a topic model. We design an active learning component, responsible for interacting with the bidders and outputting low-dimensional approximations of their types, and a mechanism design component, responsible for robustifying mechanisms for the low-dimensional model to work for the approximate types of the former component. On the active learning front, we cast our problem in the framework of Randomized Linear Algebra (RLA) for regression problems, allowing us to import several breakthrough results from that line of research, and adapt them to our setting. On the mechanism design front, we remove many restrictive assumptions of prior work on the type of access needed to the underlying distributions and the associated mechanisms. To the best of our knowledge, our work is the first to formulate connections between mechanism design, and RLA for active learning of regression problems, opening the door for further applications of randomized linear algebra primitives to mechanism design.  ( 3 min )
    SEE-OoD: Supervised Exploration For Enhanced Out-of-Distribution Detection. (arXiv:2310.08040v1 [cs.LG])
    Current techniques for Out-of-Distribution (OoD) detection predominantly rely on quantifying predictive uncertainty and incorporating model regularization during the training phase, using either real or synthetic OoD samples. However, methods that utilize real OoD samples lack exploration and are prone to overfit the OoD samples at hand. Whereas synthetic samples are often generated based on features extracted from training data, rendering them less effective when the training and OoD data are highly overlapped in the feature space. In this work, we propose a Wasserstein-score-based generative adversarial training scheme to enhance OoD detection accuracy, which, for the first time, performs data augmentation and exploration simultaneously under the supervision of limited OoD samples. Specifically, the generator explores OoD spaces and generates synthetic OoD samples using feedback from the discriminator, while the discriminator exploits both the observed and synthesized samples for OoD detection using a predefined Wasserstein score. We provide theoretical guarantees that the optimal solutions of our generative scheme are statistically achievable through adversarial training in empirical settings. We then demonstrate that the proposed method outperforms state-of-the-art techniques on various computer vision datasets and exhibits superior generalizability to unseen OoD data.  ( 2 min )
    Open-Set Knowledge-Based Visual Question Answering with Inference Paths. (arXiv:2310.08148v1 [cs.LG])
    Given an image and an associated textual question, the purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases. Prior KB-VQA models are usually formulated as a retriever-classifier framework, where a pre-trained retriever extracts textual or visual information from knowledge graphs and then makes a prediction among the candidates. Despite promising progress, there are two drawbacks with existing models. Firstly, modeling question-answering as multi-class classification limits the answer space to a preset corpus and lacks the ability of flexible reasoning. Secondly, the classifier merely consider "what is the answer" without "how to get the answer", which cannot ground the answer to explicit reasoning paths. In this paper, we confront the challenge of \emph{explainable open-set} KB-VQA, where the system is required to answer questions with entities at wild and retain an explainable reasoning path. To resolve the aforementioned issues, we propose a new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for brevity). Specifically, it contains graph constructing, pruning, and path-level ranking, which not only retrieves accurate answers but also provides inference paths that explain the reasoning process. To comprehensively evaluate our model, we reformulate the benchmark dataset OK-VQA with manually corrected entity-level annotations and release it as ConceptVQA. Extensive experiments on real-world questions demonstrate that our framework is not only able to perform open-set question answering across the whole knowledge base but provide explicit reasoning path.  ( 2 min )
    Variational Imbalanced Regression: Fair Uncertainty Quantification via Probabilistic Smoothing. (arXiv:2306.06599v4 [cs.LG] UPDATED)
    Existing regression models tend to fall short in both accuracy and uncertainty estimation when the label distribution is imbalanced. In this paper, we propose a probabilistic deep learning model, dubbed variational imbalanced regression (VIR), which not only performs well in imbalanced regression but naturally produces reasonable uncertainty estimation as a byproduct. Different from typical variational autoencoders assuming I.I.D. representations (a data point's representation is not directly affected by other data points), our VIR borrows data with similar regression labels to compute the latent representation's variational distribution; furthermore, different from deterministic regression models producing point estimates, VIR predicts the entire normal-inverse-gamma distributions and modulates the associated conjugate distributions to impose probabilistic reweighting on the imbalanced data, thereby providing better uncertainty estimation. Experiments in several real-world datasets show that our VIR can outperform state-of-the-art imbalanced regression models in terms of both accuracy and uncertainty estimation. Code will soon be available at \url{https://github.com/Wang-ML-Lab/variational-imbalanced-regression}.  ( 2 min )
    A Generic Software Framework for Distributed Topological Analysis Pipelines. (arXiv:2310.08339v1 [cs.DC])
    This system paper presents a software framework for the support of topological analysis pipelines in a distributed-memory model. While several recent papers introduced topology-based approaches for distributed-memory environments, these were reporting experiments obtained with tailored, mono-algorithm implementations. In contrast, we describe in this paper a general-purpose, generic framework for topological analysis pipelines, i.e. a sequence of topological algorithms interacting together, possibly on distinct numbers of processes. Specifically, we instantiated our framework with the MPI model, within the Topology ToolKit (TTK). While developing this framework, we faced several algorithmic and software engineering challenges, which we document in this paper. We provide a taxonomy for the distributed-memory topological algorithms supported by TTK, depending on their communication needs and provide examples of hybrid MPI+thread parallelizations. Detailed performance analyses show that parallel efficiencies range from $20\%$ to $80\%$ (depending on the algorithms), and that the MPI-specific preconditioning introduced by our framework induces a negligible computation time overhead. We illustrate the new distributed-memory capabilities of TTK with an example of advanced analysis pipeline, combining multiple algorithms, run on the largest publicly available dataset we have found (120 billion vertices) on a standard cluster with 64 nodes (for a total of 1,536 cores). Finally, we provide a roadmap for the completion of TTK's MPI extension, along with generic recommendations for each algorithm communication category.  ( 3 min )
    Trustworthy Machine Learning. (arXiv:2310.08215v1 [cs.LG])
    As machine learning technology gets applied to actual products and solutions, new challenges have emerged. Models unexpectedly fail to generalize to small changes in the distribution, tend to be confident on novel data they have never seen, or cannot communicate the rationale behind their decisions effectively with the end users. Collectively, we face a trustworthiness issue with the current machine learning technology. This textbook on Trustworthy Machine Learning (TML) covers a theoretical and technical background of four key topics in TML: Out-of-Distribution Generalization, Explainability, Uncertainty Quantification, and Evaluation of Trustworthiness. We discuss important classical and contemporary research papers of the aforementioned fields and uncover and connect their underlying intuitions. The book evolved from the homonymous course at the University of T\"ubingen, first offered in the Winter Semester of 2022/23. It is meant to be a stand-alone product accompanied by code snippets and various pointers to further sources on topics of TML. The dedicated website of the book is https://trustworthyml.io/.  ( 2 min )
    Towards a Unified Analysis of Kernel-based Methods Under Covariate Shift. (arXiv:2310.08237v1 [stat.ML])
    Covariate shift occurs prevalently in practice, where the input distributions of the source and target data are substantially different. Despite its practical importance in various learning problems, most of the existing methods only focus on some specific learning tasks and are not well validated theoretically and numerically. To tackle this problem, we propose a unified analysis of general nonparametric methods in a reproducing kernel Hilbert space (RKHS) under covariate shift. Our theoretical results are established for a general loss belonging to a rich loss function family, which includes many commonly used methods as special cases, such as mean regression, quantile regression, likelihood-based classification, and margin-based classification. Two types of covariate shift problems are the focus of this paper and the sharp convergence rates are established for a general loss function to provide a unified theoretical analysis, which concurs with the optimal results in literature where the squared loss is used. Extensive numerical studies on synthetic and real examples confirm our theoretical findings and further illustrate the effectiveness of our proposed method.  ( 2 min )
    Multi-View Variational Autoencoder for Missing Value Imputation in Untargeted Metabolomics. (arXiv:2310.07990v1 [q-bio.GN])
    Background: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. Method: In this study, we propose a novel method that leverages the information from WGS data and reference metabolites to impute unknown metabolites. Our approach utilizes a multi-view variational autoencoder to jointly model the burden score, polygenetic risk score (PGS), and linkage disequilibrium (LD) pruned single nucleotide polymorphisms (SNPs) for feature extraction and missing metabolomics data imputation. By learning the latent representations of both omics data, our method can effectively impute missing metabolomics values based on genomic information. Results: We evaluate the performance of our method on empirical metabolomics datasets with missing values and demonstrate its superiority compared to conventional imputation techniques. Using 35 template metabolites derived burden scores, PGS and LD-pruned SNPs, the proposed methods achieved r2-scores > 0.01 for 71.55% of metabolites. Conclusion: The integration of WGS data in metabolomics imputation not only improves data completeness but also enhances downstream analyses, paving the way for more comprehensive and accurate investigations of metabolic pathways and disease associations. Our findings offer valuable insights into the potential benefits of utilizing WGS data for metabolomics data imputation and underscore the importance of leveraging multi-modal data integration in precision medicine research.  ( 3 min )
    Generative Modeling with Phase Stochastic Bridges. (arXiv:2310.07805v1 [cs.LG])
    Diffusion models (DMs) represent state-of-the-art generative models for continuous inputs. DMs work by constructing a Stochastic Differential Equation (SDE) in the input space (ie, position space), and using a neural network to reverse it. In this work, we introduce a novel generative modeling framework grounded in \textbf{phase space dynamics}, where a phase space is defined as {an augmented space encompassing both position and velocity.} Leveraging insights from Stochastic Optimal Control, we construct a path measure in the phase space that enables efficient sampling. {In contrast to DMs, our framework demonstrates the capability to generate realistic data points at an early stage of dynamics propagation.} This early prediction sets the stage for efficient data generation by leveraging additional velocity information along the trajectory. On standard image generation benchmarks, our model yields favorable performance over baselines in the regime of small Number of Function Evaluations (NFEs). Furthermore, our approach rivals the performance of diffusion models equipped with efficient sampling techniques, underscoring its potential as a new tool generative modeling.  ( 2 min )
    Optimizing Convolutional Neural Networks for Chronic Obstructive Pulmonary Disease Detection in Clinical Computed Tomography Imaging. (arXiv:2303.07189v3 [eess.IV] UPDATED)
    We aim to optimize the binary detection of Chronic Obstructive Pulmonary Disease (COPD) based on emphysema presence in the lung with convolutional neural networks (CNN) by exploring manually adjusted versus automated window-setting optimization (WSO) on computed tomography (CT) images. 7,194 CT images (3,597 with COPD; 3,597 healthy controls) from 78 subjects (43 with COPD; 35 healthy controls) were selected retrospectively (10.2018-12.2019) and preprocessed. For each image, intensity values were manually clipped to the emphysema window setting and a baseline 'full-range' window setting. Class-balanced train, validation, and test sets contained 3,392, 1,114, and 2,688 images. The network backbone was optimized by comparing various CNN architectures. Furthermore, automated WSO was implemented by adding a customized layer to the model. The image-level area under the Receiver Operating Characteristics curve (AUC) [lower, upper limit 95% confidence] was utilized to compare model variations. Repeated inference (n=7) on the test set showed that the DenseNet was the most efficient backbone and achieved a mean AUC of 0.80 [0.76, 0.85] without WSO. Comparably, with input images manually adjusted to the emphysema window, the DenseNet model predicted COPD with a mean AUC of 0.86 [0.82, 0.89]. By adding a customized WSO layer to the DenseNet, an optimal window in the proximity of the emphysema window setting was learned automatically, and a mean AUC of 0.82 [0.78, 0.86] was achieved. Detection of COPD with DenseNet models was improved by WSO of CT data to the emphysema window setting range.  ( 3 min )
    High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation. (arXiv:2304.02621v2 [cs.CV] UPDATED)
    Image-level weakly-supervised semantic segmentation (WSSS) reduces the usually vast data annotation cost by surrogate segmentation masks during training. The typical approach involves training an image classification network using global average pooling (GAP) on convolutional feature maps. This enables the estimation of object locations based on class activation maps (CAMs), which identify the importance of image regions. The CAMs are then used to generate pseudo-labels, in the form of segmentation masks, to supervise a segmentation model in the absence of pixel-level ground truth. Our work is based on two techniques for improving CAMs; importance sampling, which is a substitute for GAP, and the feature similarity loss, which utilizes a heuristic that object contours almost always align with color edges in images. However, both are based on the multinomial posterior with softmax, and implicitly assume that classes are mutually exclusive, which turns out suboptimal in our experiments. Thus, we reformulate both techniques based on binomial posteriors of multiple independent binary problems. This has two benefits; their performance is improved and they become more general, resulting in an add-on method that can boost virtually any WSSS method. This is demonstrated on a wide variety of baselines on the PASCAL VOC dataset, improving the region similarity and contour quality of all implemented state-of-the-art methods. Experiments on the MS COCO dataset show that our proposed add-on is well-suited for large-scale settings. Our code is available at https://github.com/arvijj/hfpl.  ( 3 min )
    Differentially-Private Decision Trees and Provable Robustness to Data Poisoning. (arXiv:2305.15394v2 [cs.LG] UPDATED)
    Decision trees are interpretable models that are well-suited to non-linear learning problems. Much work has been done on extending decision tree learning algorithms with differential privacy, a system that guarantees the privacy of samples within the training data. However, current state-of-the-art algorithms for this purpose sacrifice much utility for a small privacy benefit. These solutions create random decision nodes that reduce decision tree accuracy or spend an excessive share of the privacy budget on labeling leaves. Moreover, many works do not support continuous features or leak information about them. We propose a new method called PrivaTree based on private histograms that chooses good splits while consuming a small privacy budget. The resulting trees provide a significantly better privacy-utility trade-off and accept mixed numerical and categorical data without leaking information about numerical features. Finally, while it is notoriously hard to give robustness guarantees against data poisoning attacks, we demonstrate bounds for the expected accuracy and success rates of backdoor attacks against differentially-private learners. By leveraging the better privacy-utility trade-off of PrivaTree we are able to train decision trees with significantly better robustness against backdoor attacks compared to regular decision trees and with meaningful theoretical guarantees.  ( 2 min )
    A Symmetry-Aware Exploration of Bayesian Neural Network Posteriors. (arXiv:2310.08287v1 [stat.ML])
    The distribution of the weights of modern deep neural networks (DNNs) - crucial for uncertainty quantification and robustness - is an eminently complex object due to its extremely high dimensionality. This paper proposes one of the first large-scale explorations of the posterior distribution of deep Bayesian Neural Networks (BNNs), expanding its study to real-world vision tasks and architectures. Specifically, we investigate the optimal approach for approximating the posterior, analyze the connection between posterior quality and uncertainty quantification, delve into the impact of modes on the posterior, and explore methods for visualizing the posterior. Moreover, we uncover weight-space symmetries as a critical aspect for understanding the posterior. To this extent, we develop an in-depth assessment of the impact of both permutation and scaling symmetries that tend to obfuscate the Bayesian posterior. While the first type of transformation is known for duplicating modes, we explore the relationship between the latter and L2 regularization, challenging previous misconceptions. Finally, to help the community improve our understanding of the Bayesian posterior, we will shortly release the first large-scale checkpoint dataset, including thousands of real-world models and our codes.
    Learning Transferable Conceptual Prototypes for Interpretable Unsupervised Domain Adaptation. (arXiv:2310.08071v1 [cs.LG])
    Despite the great progress of unsupervised domain adaptation (UDA) with the deep neural networks, current UDA models are opaque and cannot provide promising explanations, limiting their applications in the scenarios that require safe and controllable model decisions. At present, a surge of work focuses on designing deep interpretable methods with adequate data annotations and only a few methods consider the distributional shift problem. Most existing interpretable UDA methods are post-hoc ones, which cannot facilitate the model learning process for performance enhancement. In this paper, we propose an inherently interpretable method, named Transferable Conceptual Prototype Learning (TCPL), which could simultaneously interpret and improve the processes of knowledge transfer and decision-making in UDA. To achieve this goal, we design a hierarchically prototypical module that transfers categorical basic concepts from the source domain to the target domain and learns domain-shared prototypes for explaining the underlying reasoning process. With the learned transferable prototypes, a self-predictive consistent pseudo-label strategy that fuses confidence, predictions, and prototype information, is designed for selecting suitable target samples for pseudo annotations and gradually narrowing down the domain gap. Comprehensive experiments show that the proposed method can not only provide effective and intuitive explanations but also outperform previous state-of-the-arts.  ( 2 min )
    Graph-SCP: Accelerating Set Cover Problems with Graph Neural Networks. (arXiv:2310.07979v1 [cs.LG])
    Machine learning (ML) approaches are increasingly being used to accelerate combinatorial optimization (CO) problems. We look specifically at the Set Cover Problem (SCP) and propose Graph-SCP, a graph neural network method that can augment existing optimization solvers by learning to identify a much smaller sub-problem that contains the solution space. We evaluate the performance of Graph-SCP on synthetic weighted and unweighted SCP instances with diverse problem characteristics and complexities, and on instances from the OR Library, a canonical benchmark for SCP. We show that Graph-SCP reduces the problem size by 30-70% and achieves run time speedups up to~25x when compared to commercial solvers (Gurobi). Given a desired optimality threshold, Graph-SCP will improve upon it or even achieve 100% optimality. This is in contrast to fast greedy solutions that significantly compromise solution quality to achieve guaranteed polynomial run time. Graph-SCP can generalize to larger problem sizes and can be used with other conventional or ML-augmented CO solvers to lead to potential additional run time improvement.  ( 2 min )
    Hyperparameter Adaptive Search for Surrogate Optimization: A Self-Adjusting Approach. (arXiv:2310.07970v1 [cs.LG])
    Surrogate Optimization (SO) algorithms have shown promise for optimizing expensive black-box functions. However, their performance is heavily influenced by hyperparameters related to sampling and surrogate fitting, which poses a challenge to their widespread adoption. We investigate the impact of hyperparameters on various SO algorithms and propose a Hyperparameter Adaptive Search for SO (HASSO) approach. HASSO is not a hyperparameter tuning algorithm, but a generic self-adjusting SO algorithm that dynamically tunes its own hyperparameters while concurrently optimizing the primary objective function, without requiring additional evaluations. The aim is to improve the accessibility, effectiveness, and convergence speed of SO algorithms for practitioners. Our approach identifies and modifies the most influential hyperparameters specific to each problem and SO approach, reducing the need for manual tuning without significantly increasing the computational burden. Experimental results demonstrate the effectiveness of HASSO in enhancing the performance of various SO algorithms across different global optimization test problems.  ( 2 min )
    Discerning Temporal Difference Learning. (arXiv:2310.08091v1 [cs.LG])
    Temporal difference learning (TD) is a foundational concept in reinforcement learning (RL), aimed at efficiently assessing a policy's value function. TD($\lambda$), a potent variant, incorporates a memory trace to distribute the prediction error into the historical context. However, this approach often neglects the significance of historical states and the relative importance of propagating the TD error, influenced by challenges such as visitation imbalance or outcome noise. To address this, we propose a novel TD algorithm named discerning TD learning (DTD), which allows flexible emphasis functions$-$predetermined or adapted during training$-$to allocate efforts effectively across states. We establish the convergence properties of our method within a specific class of emphasis functions and showcase its promising potential for adaptation to deep RL contexts. Empirical results underscore that employing a judicious emphasis function not only improves value estimation but also expedites learning across diverse scenarios.  ( 2 min )
    Unraveling the Single Tangent Space Fallacy: An Analysis and Clarification for Applying Riemannian Geometry in Robot Learning. (arXiv:2310.07902v1 [cs.RO])
    In the realm of robotics, numerous downstream robotics tasks leverage machine learning methods for processing, modeling, or synthesizing data. Often, this data comprises variables that inherently carry geometric constraints, such as the unit-norm condition of quaternions representing rigid-body orientations or the positive definiteness of stiffness and manipulability ellipsoids. Handling such geometric constraints effectively requires the incorporation of tools from differential geometry into the formulation of machine learning methods. In this context, Riemannian manifolds emerge as a powerful mathematical framework to handle such geometric constraints. Nevertheless, their recent adoption in robot learning has been largely characterized by a mathematically-flawed simplification, hereinafter referred to as the ``single tangent space fallacy". This approach involves merely projecting the data of interest onto a single tangent (Euclidean) space, over which an off-the-shelf learning algorithm is applied. This paper provides a theoretical elucidation of various misconceptions surrounding this approach and offers experimental evidence of its shortcomings. Finally, it presents valuable insights to promote best practices when employing Riemannian geometry within robot learning applications.  ( 2 min )
    LEMON: Lossless model expansion. (arXiv:2310.07999v1 [cs.LG])
    Scaling of deep neural networks, especially Transformers, is pivotal for their surging performance and has further led to the emergence of sophisticated reasoning capabilities in foundation models. Such scaling generally requires training large models from scratch with random initialization, failing to leverage the knowledge acquired by their smaller counterparts, which are already resource-intensive to obtain. To tackle this inefficiency, we present $\textbf{L}$ossl$\textbf{E}$ss $\textbf{MO}$del Expansio$\textbf{N}$ (LEMON), a recipe to initialize scaled models using the weights of their smaller but pre-trained counterparts. This is followed by model training with an optimized learning rate scheduler tailored explicitly for the scaled models, substantially reducing the training time compared to training from scratch. Notably, LEMON is versatile, ensuring compatibility with various network structures, including models like Vision Transformers and BERT. Our empirical results demonstrate that LEMON reduces computational costs by 56.7% for Vision Transformers and 33.2% for BERT when compared to training from scratch.  ( 2 min )
    TabLib: A Dataset of 627M Tables with Context. (arXiv:2310.07875v1 [cs.CL])
    It is well-established that large, diverse datasets play a pivotal role in the performance of modern AI systems for text and image modalities. However, there are no datasets for tabular data of comparable size and diversity to those available for text and images. Thus we present "TabLib'', a compilation of 627 million tables totaling 69 TiB, along with 867B tokens of context. TabLib was extracted from numerous file formats, including CSV, HTML, SQLite, PDF, Excel, and others, sourced from GitHub and Common Crawl. The size and diversity of TabLib offer considerable promise in the table modality, reminiscent of the original promise of foundational datasets for text and images, such as The Pile and LAION.  ( 2 min )
    Enhanced sampling of Crystal Nucleation with Graph Representation Learnt Variables. (arXiv:2310.07927v1 [cond-mat.stat-mech])
    In this study, we present a graph neural network-based learning approach using an autoencoder setup to derive low-dimensional variables from features observed in experimental crystal structures. These variables are then biased in enhanced sampling to observe state-to-state transitions and reliable thermodynamic weights. Our approach uses simple convolution and pooling methods. To verify the effectiveness of our protocol, we examined the nucleation of various allotropes and polymorphs of iron and glycine from their molten states. Our graph latent variables when biased in well-tempered metadynamics consistently show transitions between states and achieve accurate free energy calculations in agreement with experiments, both of which are indicators of dependable sampling. This underscores the strength and promise of our graph neural net variables for improved sampling. The protocol shown here should be applicable for other systems and with other sampling methods.  ( 2 min )
    Online RL in Linearly $q^\pi$-Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore. (arXiv:2310.07811v1 [cs.LG])
    We consider online reinforcement learning (RL) in episodic Markov decision processes (MDPs) under the linear $q^\pi$-realizability assumption, where it is assumed that the action-values of all policies can be expressed as linear functions of state-action features. This class is known to be more general than linear MDPs, where the transition kernel and the reward function are assumed to be linear functions of the feature vectors. As our first contribution, we show that the difference between the two classes is the presence of states in linearly $q^\pi$-realizable MDPs where for any policy, all the actions have approximately equal values, and skipping over these states by following an arbitrarily fixed policy in those states transforms the problem to a linear MDP. Based on this observation, we derive a novel (computationally inefficient) learning algorithm for linearly $q^\pi$-realizable MDPs that simultaneously learns what states should be skipped over and runs another learning algorithm on the linear MDP hidden in the problem. The method returns an $\epsilon$-optimal policy after $\text{polylog}(H, d)/\epsilon^2$ interactions with the MDP, where $H$ is the time horizon and $d$ is the dimension of the feature vectors, giving the first polynomial-sample-complexity online RL algorithm for this setting. The results are proved for the misspecified case, where the sample complexity is shown to degrade gracefully with the misspecification error.  ( 3 min )
    Large Language Models Are Zero-Shot Time Series Forecasters. (arXiv:2310.07820v1 [cs.LG])
    By encoding time series as a string of numerical digits, we can frame time series forecasting as next-token prediction in text. Developing this approach, we find that large language models (LLMs) such as GPT-3 and LLaMA-2 can surprisingly zero-shot extrapolate time series at a level comparable to or exceeding the performance of purpose-built time series models trained on the downstream tasks. To facilitate this performance, we propose procedures for effectively tokenizing time series data and converting discrete distributions over tokens into highly flexible densities over continuous values. We argue the success of LLMs for time series stems from their ability to naturally represent multimodal distributions, in conjunction with biases for simplicity, and repetition, which align with the salient features in many time series, such as repeated seasonal trends. We also show how LLMs can naturally handle missing data without imputation through non-numerical text, accommodate textual side information, and answer questions to help explain predictions. While we find that increasing model size generally improves performance on time series, we show GPT-4 can perform worse than GPT-3 because of how it tokenizes numbers, and poor uncertainty calibration, which is likely the result of alignment interventions such as RLHF.  ( 2 min )
    ASV Station Keeping under Wind Disturbances using Neural Network Simulation Error Minimization Model Predictive Control. (arXiv:2310.07892v1 [cs.RO])
    Station keeping is an essential maneuver for Autonomous Surface Vehicles (ASVs), mainly when used in confined spaces, to carry out surveys that require the ASV to keep its position or in collaboration with other vehicles where the relative position has an impact over the mission. However, this maneuver can become challenging for classic feedback controllers due to the need for an accurate model of the ASV dynamics and the environmental disturbances. This work proposes a Model Predictive Controller using Neural Network Simulation Error Minimization (NNSEM-MPC) to accurately predict the dynamics of the ASV under wind disturbances. The performance of the proposed scheme under wind disturbances is tested and compared against other controllers in simulation, using the Robotics Operating System (ROS) and the multipurpose simulation environment Gazebo. A set of six tests were conducted by combining two wind speeds (3 m/s and 6 m/s) and three wind directions (0$^\circ$, 90$^\circ$, and 180$^\circ$). The simulation results clearly show the advantage of the NNSEM-MPC over the following methods: backstepping controller, sliding mode controller, simplified dynamics MPC (SD-MPC), neural ordinary differential equation MPC (NODE-MPC), and knowledge-based NODE MPC (KNODE-MPC). The proposed NNSEM-MPC approach performs better than the rest in 4 out of the 6 test conditions, and it is the second best in the 2 remaining test cases, reducing the mean position and heading error by at least 31\% and 46\% respectively across all the test cases. In terms of execution speed, the proposed NNSEM-MPC is at least 36\% faster than the rest of the MPC controllers. The field experiments on two different ASV platforms showed that ASVs can effectively keep the station utilizing the proposed method, with a position error as low as $1.68$ m and a heading error as low as $6.14^{\circ}$ within time windows of at least $150$s.  ( 3 min )
    NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration. (arXiv:2310.07896v1 [cs.RO])
    Robotic learning for navigation in unfamiliar environments needs to provide policies for both task-oriented navigation (i.e., reaching a goal that the robot has located), and task-agnostic exploration (i.e., searching for a goal in a novel setting). Typically, these roles are handled by separate models, for example by using subgoal proposals, planning, or separate navigation strategies. In this paper, we describe how we can train a single unified diffusion policy to handle both goal-directed navigation and goal-agnostic exploration, with the latter providing the ability to search novel environments, and the former providing the ability to reach a user-specified goal once it has been located. We show that this unified policy results in better overall performance when navigating to visually indicated goals in novel environments, as compared to approaches that use subgoal proposals from generative models, or prior methods based on latent variable models. We instantiate our method by using a large-scale Transformer-based policy trained on data from multiple ground robots, with a diffusion model decoder to flexibly handle both goal-conditioned and goal-agnostic navigation. Our experiments, conducted on a real-world mobile robot platform, show effective navigation in unseen environments in comparison with five alternative methods, and demonstrate significant improvements in performance and lower collision rates, despite utilizing smaller models than state-of-the-art approaches. For more videos, code, and pre-trained model checkpoints, see https://general-navigation-models.github.io/nomad/  ( 2 min )
    RandCom: Random Communication Skipping Method for Decentralized Stochastic Optimization. (arXiv:2310.07983v1 [cs.LG])
    Distributed optimization methods with random communication skips are gaining increasing attention due to their proven benefits in accelerating communication complexity. Nevertheless, existing research mainly focuses on centralized communication protocols for strongly convex deterministic settings. In this work, we provide a decentralized optimization method called RandCom, which incorporates probabilistic local updates. We analyze the performance of RandCom in stochastic non-convex, convex, and strongly convex settings and demonstrate its ability to asymptotically reduce communication overhead by the probability of communication. Additionally, we prove that RandCom achieves linear speedup as the number of nodes increases. In stochastic strongly convex settings, we further prove that RandCom can achieve linear speedup with network-independent stepsizes. Moreover, we apply RandCom to federated learning and provide positive results concerning the potential for achieving linear speedup and the suitability of the probabilistic local update approach for non-convex settings.  ( 2 min )
    A Review of Machine Learning Techniques in Imbalanced Data and Future Trends. (arXiv:2310.07917v1 [cs.LG])
    For over two decades, detecting rare events has been a challenging task among researchers in the data mining and machine learning domain. Real-life problems inspire researchers to navigate and further improve data processing and algorithmic approaches to achieve effective and computationally efficient methods for imbalanced learning. In this paper, we have collected and reviewed 258 peer-reviewed papers from archival journals and conference papers in an attempt to provide an in-depth review of various approaches in imbalanced learning from technical and application perspectives. This work aims to provide a structured review of methods used to address the problem of imbalanced data in various domains and create a general guideline for researchers in academia or industry who want to dive into the broad field of machine learning using large-scale imbalanced data.  ( 2 min )
    QArchSearch: A Scalable Quantum Architecture Search Package. (arXiv:2310.07858v1 [quant-ph])
    The current era of quantum computing has yielded several algorithms that promise high computational efficiency. While the algorithms are sound in theory and can provide potentially exponential speedup, there is little guidance on how to design proper quantum circuits to realize the appropriate unitary transformation to be applied to the input quantum state. In this paper, we present \texttt{QArchSearch}, an AI based quantum architecture search package with the \texttt{QTensor} library as a backend that provides a principled and automated approach to finding the best model given a task and input quantum state. We show that the search package is able to efficiently scale the search to large quantum circuits and enables the exploration of more complex models for different quantum applications. \texttt{QArchSearch} runs at scale and high efficiency on high-performance computing systems using a two-level parallelization scheme on both CPUs and GPUs, which has been demonstrated on the Polaris supercomputer.  ( 2 min )
    When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement. (arXiv:2310.07831v1 [cs.LG])
    Learning rate schedules used in practice bear little resemblance to those recommended by theory. We close much of this theory/practice gap, and as a consequence are able to derive new problem-adaptive learning rate schedules. Our key technical contribution is a refined analysis of learning rate schedules for a wide class of optimization algorithms (including SGD). In contrast to most prior works that study the convergence of the average iterate, we study the last iterate, which is what most people use in practice. When considering only worst-case analysis, our theory predicts that the best choice is the linear decay schedule: a popular choice in practice that sets the stepsize proportionally to $1 - t/T$, where $t$ is the current iteration and $T$ is the total number of steps. To go beyond this worst-case analysis, we use the observed gradient norms to derive schedules refined for any particular task. These refined schedules exhibit learning rate warm-up and rapid learning rate annealing near the end of training. Ours is the first systematic approach to automatically yield both of these properties. We perform the most comprehensive evaluation of learning rate schedules to date, evaluating across 10 diverse deep learning problems, a series of LLMs, and a suite of logistic regression problems. We validate that overall, the linear-decay schedule matches or outperforms all commonly used default schedules including cosine annealing, and that our schedule refinement method gives further improvements.  ( 3 min )
    Feature Learning and Generalization in Deep Networks with Orthogonal Weights. (arXiv:2310.07765v1 [cs.LG])
    Fully-connected deep neural networks with weights initialized from independent Gaussian distributions can be tuned to criticality, which prevents the exponential growth or decay of signals propagating through the network. However, such networks still exhibit fluctuations that grow linearly with the depth of the network, which may impair the training of networks with width comparable to depth. We show analytically that rectangular networks with tanh activations and weights initialized from the ensemble of orthogonal matrices have corresponding preactivation fluctuations which are independent of depth, to leading order in inverse width. Moreover, we demonstrate numerically that, at initialization, all correlators involving the neural tangent kernel (NTK) and its descendants at leading order in inverse width -- which govern the evolution of observables during training -- saturate at a depth of $\sim 20$, rather than growing without bound as in the case of Gaussian initializations. We speculate that this structure preserves finite-width feature learning while reducing overall noise, thus improving both generalization and training speed. We provide some experimental justification by relating empirical measurements of the NTK to the superior performance of deep nonlinear orthogonal networks trained under full-batch gradient descent on the MNIST and CIFAR-10 classification tasks.  ( 2 min )
    Faithfulness Measurable Masked Language Models. (arXiv:2310.07819v1 [cs.CL])
    A common approach to explain NLP models, is to use importance measures that express which tokens are important for a prediction. Unfortunately, such explanations are often wrong despite being persuasive. Therefore, it is essential to measure their faithfulness. One such metric is if tokens are truly important, then masking them should result in worse model performance. However, token masking introduces out-of-distribution issues and existing solutions are computationally expensive and employ proxy-models. Furthermore, other metrics are very limited in scope. In this work, we propose an inherently faithfulness measurable model that addresses these challenges. This is achieved by using a novel fine-tuning method that incorporates masking, such that masking tokens become in-distribution by design. This differs from existing approaches, which are completely model-agnostic but are inapplicable in practice. We demonstrate the generality of our approach by applying it to various tasks and validate it using statistical in-distribution tests. Additionally, because masking is in-distribution, importance measures which themselves use masking become more faithful, thus our model becomes more explainable.  ( 2 min )
    Using Spark Machine Learning Models to Perform Predictive Analysis on Flight Ticket Pricing Data. (arXiv:2310.07787v1 [cs.LG])
    This paper discusses predictive performance and processes undertaken on flight pricing data utilizing r2(r-square) and RMSE that leverages a large dataset, originally from Expedia.com, consisting of approximately 20 million records or 4.68 gigabytes. The project aims to determine the best models usable in the real world to predict airline ticket fares for non-stop flights across the US. Therefore, good generalization capability and optimized processing times are important measures for the model. We will discover key business insights utilizing feature importance and discuss the process and tools used for our analysis. Four regression machine learning algorithms were utilized: Random Forest, Gradient Boost Tree, Decision Tree, and Factorization Machines utilizing Cross Validator and Training Validator functions for assessing performance and generalization capability.  ( 2 min )
    Self-supervised Representation Learning From Random Data Projectors. (arXiv:2310.07756v1 [cs.LG])
    Self-supervised representation learning~(SSRL) has advanced considerably by exploiting the transformation invariance assumption under artificially designed data augmentations. While augmentation-based SSRL algorithms push the boundaries of performance in computer vision and natural language processing, they are often not directly applicable to other data modalities, and can conflict with application-specific data augmentation constraints. This paper presents an SSRL approach that can be applied to any data modality and network architecture because it does not rely on augmentations or masking. Specifically, we show that high-quality data representations can be learned by reconstructing random data projections. We evaluate the proposed approach on a wide range of representation learning tasks that span diverse modalities and real-world applications. We show that it outperforms multiple state-of-the-art SSRL baselines. Due to its wide applicability and strong empirical results, we argue that learning from randomness is a fruitful research direction worthy of attention and further study.  ( 2 min )
    Parametric Leaky Tanh: A New Hybrid Activation Function for Deep Learning. (arXiv:2310.07720v1 [cs.LG])
    Activation functions (AFs) are crucial components of deep neural networks (DNNs), having a significant impact on their performance. An activation function in a DNN is typically a smooth, nonlinear function that transforms an input signal into an output signal for the subsequent layer. In this paper, we propose the Parametric Leaky Tanh (PLTanh), a novel hybrid activation function designed to combine the strengths of both the Tanh and Leaky ReLU (LReLU) activation functions. PLTanh is differentiable at all points and addresses the 'dying ReLU' problem by ensuring a non-zero gradient for negative inputs, consistent with the behavior of LReLU. By integrating the unique advantages of these two diverse activation functions, PLTanh facilitates the learning of more intricate nonlinear relationships within the network. This paper presents an empirical evaluation of PLTanh against established activation functions, namely ReLU, LReLU, and ALReLU utilizing five diverse datasets.  ( 2 min )
    Visual Forecasting as a Mid-level Representation for Avoidance. (arXiv:2310.07724v1 [cs.RO])
    The challenge of navigation in environments with dynamic objects continues to be a central issue in the study of autonomous agents. While predictive methods hold promise, their reliance on precise state information makes them less practical for real-world implementation. This study presents visual forecasting as an innovative alternative. By introducing intuitive visual cues, this approach projects the future trajectories of dynamic objects to improve agent perception and enable anticipatory actions. Our research explores two distinct strategies for conveying predictive information through visual forecasting: (1) sequences of bounding boxes, and (2) augmented paths. To validate the proposed visual forecasting strategies, we initiate evaluations in simulated environments using the Unity engine and then extend these evaluations to real-world scenarios to assess both practicality and effectiveness. The results confirm the viability of visual forecasting as a promising solution for navigation and obstacle avoidance in dynamic environments.  ( 2 min )
  • Open

    Feature Learning and Generalization in Deep Networks with Orthogonal Weights. (arXiv:2310.07765v1 [cs.LG])
    Fully-connected deep neural networks with weights initialized from independent Gaussian distributions can be tuned to criticality, which prevents the exponential growth or decay of signals propagating through the network. However, such networks still exhibit fluctuations that grow linearly with the depth of the network, which may impair the training of networks with width comparable to depth. We show analytically that rectangular networks with tanh activations and weights initialized from the ensemble of orthogonal matrices have corresponding preactivation fluctuations which are independent of depth, to leading order in inverse width. Moreover, we demonstrate numerically that, at initialization, all correlators involving the neural tangent kernel (NTK) and its descendants at leading order in inverse width -- which govern the evolution of observables during training -- saturate at a depth of $\sim 20$, rather than growing without bound as in the case of Gaussian initializations. We speculate that this structure preserves finite-width feature learning while reducing overall noise, thus improving both generalization and training speed. We provide some experimental justification by relating empirical measurements of the NTK to the superior performance of deep nonlinear orthogonal networks trained under full-batch gradient descent on the MNIST and CIFAR-10 classification tasks.  ( 2 min )
    LEMON: Lossless model expansion. (arXiv:2310.07999v1 [cs.LG])
    Scaling of deep neural networks, especially Transformers, is pivotal for their surging performance and has further led to the emergence of sophisticated reasoning capabilities in foundation models. Such scaling generally requires training large models from scratch with random initialization, failing to leverage the knowledge acquired by their smaller counterparts, which are already resource-intensive to obtain. To tackle this inefficiency, we present $\textbf{L}$ossl$\textbf{E}$ss $\textbf{MO}$del Expansio$\textbf{N}$ (LEMON), a recipe to initialize scaled models using the weights of their smaller but pre-trained counterparts. This is followed by model training with an optimized learning rate scheduler tailored explicitly for the scaled models, substantially reducing the training time compared to training from scratch. Notably, LEMON is versatile, ensuring compatibility with various network structures, including models like Vision Transformers and BERT. Our empirical results demonstrate that LEMON reduces computational costs by 56.7% for Vision Transformers and 33.2% for BERT when compared to training from scratch.
    $L^1$ Estimation: On the Optimality of Linear Estimators. (arXiv:2309.09129v2 [math.ST] UPDATED)
    Consider the problem of estimating a random variable $X$ from noisy observations $Y = X+ Z$, where $Z$ is standard normal, under the $L^1$ fidelity criterion. It is well known that the optimal Bayesian estimator in this setting is the conditional median. This work shows that the only prior distribution on $X$ that induces linearity in the conditional median is Gaussian. Along the way, several other results are presented. In particular, it is demonstrated that if the conditional distribution $P_{X|Y=y}$ is symmetric for all $y$, then $X$ must follow a Gaussian distribution. Additionally, we consider other $L^p$ losses and observe the following phenomenon: for $p \in [1,2]$, Gaussian is the only prior distribution that induces a linear optimal Bayesian estimator, and for $p \in (2,\infty)$, infinitely many prior distributions on $X$ can induce linearity. Finally, extensions are provided to encompass noise models leading to conditional distributions from certain exponential families.
    A Symmetry-Aware Exploration of Bayesian Neural Network Posteriors. (arXiv:2310.08287v1 [stat.ML])
    The distribution of the weights of modern deep neural networks (DNNs) - crucial for uncertainty quantification and robustness - is an eminently complex object due to its extremely high dimensionality. This paper proposes one of the first large-scale explorations of the posterior distribution of deep Bayesian Neural Networks (BNNs), expanding its study to real-world vision tasks and architectures. Specifically, we investigate the optimal approach for approximating the posterior, analyze the connection between posterior quality and uncertainty quantification, delve into the impact of modes on the posterior, and explore methods for visualizing the posterior. Moreover, we uncover weight-space symmetries as a critical aspect for understanding the posterior. To this extent, we develop an in-depth assessment of the impact of both permutation and scaling symmetries that tend to obfuscate the Bayesian posterior. While the first type of transformation is known for duplicating modes, we explore the relationship between the latter and L2 regularization, challenging previous misconceptions. Finally, to help the community improve our understanding of the Bayesian posterior, we will shortly release the first large-scale checkpoint dataset, including thousands of real-world models and our codes.
    A general framework for multi-step ahead adaptive conformal heteroscedastic time series forecasting. (arXiv:2207.14219v9 [stat.ML] UPDATED)
    This paper introduces a novel model-agnostic algorithm called adaptive ensemble batch multi-input multi-output conformalized quantile regression (AEnbMIMOCQR} that enables forecasters to generate multi-step ahead prediction intervals for a fixed pre-specified miscoverage rate in a distribution-free manner. Our method is grounded on conformal prediction principles, however, it does not require data splitting and provides close to exact coverage even when the data is not exchangeable. Moreover, the resulting prediction intervals, besides being empirically valid along the forecast horizon, do not neglect heteroscedasticity. AEnbMIMOCQR is designed to be robust to distribution shifts, which means that its prediction intervals remain reliable over an unlimited period of time, without entailing retraining or imposing unrealistic strict assumptions on the data-generating process. Through methodically experimentation, we demonstrate that our approach outperforms other competitive methods on both real-world and synthetic datasets. The code used in the experimental part and a tutorial on how to use AEnbMIMOCQR can be found at the following GitHub repository: https://github.com/Quilograma/AEnbMIMOCQR.
    RandCom: Random Communication Skipping Method for Decentralized Stochastic Optimization. (arXiv:2310.07983v1 [cs.LG])
    Distributed optimization methods with random communication skips are gaining increasing attention due to their proven benefits in accelerating communication complexity. Nevertheless, existing research mainly focuses on centralized communication protocols for strongly convex deterministic settings. In this work, we provide a decentralized optimization method called RandCom, which incorporates probabilistic local updates. We analyze the performance of RandCom in stochastic non-convex, convex, and strongly convex settings and demonstrate its ability to asymptotically reduce communication overhead by the probability of communication. Additionally, we prove that RandCom achieves linear speedup as the number of nodes increases. In stochastic strongly convex settings, we further prove that RandCom can achieve linear speedup with network-independent stepsizes. Moreover, we apply RandCom to federated learning and provide positive results concerning the potential for achieving linear speedup and the suitability of the probabilistic local update approach for non-convex settings.  ( 2 min )
    Characterizing climate pathways using feature importance on echo state networks. (arXiv:2310.08495v1 [stat.ML])
    The 2022 National Defense Strategy of the United States listed climate change as a serious threat to national security. Climate intervention methods, such as stratospheric aerosol injection, have been proposed as mitigation strategies, but the downstream effects of such actions on a complex climate system are not well understood. The development of algorithmic techniques for quantifying relationships between source and impact variables related to a climate event (i.e., a climate pathway) would help inform policy decisions. Data-driven deep learning models have become powerful tools for modeling highly nonlinear relationships and may provide a route to characterize climate variable relationships. In this paper, we explore the use of an echo state network (ESN) for characterizing climate pathways. ESNs are a computationally efficient neural network variation designed for temporal data, and recent work proposes ESNs as a useful tool for forecasting spatio-temporal climate data. Like other neural networks, ESNs are non-interpretable black-box models, which poses a hurdle for understanding variable relationships. We address this issue by developing feature importance methods for ESNs in the context of spatio-temporal data to quantify variable relationships captured by the model. We conduct a simulation study to assess and compare the feature importance techniques, and we demonstrate the approach on reanalysis climate data. In the climate application, we select a time period that includes the 1991 volcanic eruption of Mount Pinatubo. This event was a significant stratospheric aerosol injection, which we use as a proxy for an artificial stratospheric aerosol injection. Using the proposed approach, we are able to characterize relationships between pathway variables associated with this event.  ( 3 min )
    Impact of multi-armed bandit strategies on deep recurrent reinforcement learning. (arXiv:2310.08331v1 [stat.ML])
    Incomplete knowledge of the environment leads an agent to make decisions under uncertainty. One of the major dilemmas in Reinforcement Learning (RL) where an autonomous agent has to balance two contrasting needs in making its decisions is: exploiting the current knowledge of the environment to maximize the cumulative reward as well as exploring actions that allow improving the knowledge of the environment, hopefully leading to higher reward values (exploration-exploitation trade-off). Concurrently, another relevant issue regards the full observability of the states, which may not be assumed in all applications. Such as when only 2D images are considered as input in a RL approach used for finding the optimal action within a 3D simulation environment. In this work, we address these issues by deploying and testing several techniques to balance exploration and exploitation trade-off on partially observable systems for predicting steering wheels in autonomous driving scenario. More precisely, the final aim is to investigate the effects of using both stochastic and deterministic multi-armed bandit strategies coupled with a Deep Recurrent Q-Network. Additionally, we adapted and evaluated the impact of an innovative method to improve the learning phase of the underlying Convolutional Recurrent Neural Network. We aim to show that adaptive stochastic methods for exploration better approximate the trade-off between exploration and exploitation as, in general, Softmax and Max-Boltzmann strategies are able to outperform epsilon-greedy techniques.  ( 2 min )
    Clustering Three-Way Data with Outliers. (arXiv:2310.05288v2 [stat.ML] UPDATED)
    Matrix-variate distributions are a recent addition to the model-based clustering field, thereby making it possible to analyze data in matrix form with complex structure such as images and time series. Due to its recent appearance, there is limited literature on matrix-variate data, with even less on dealing with outliers in these models. An approach for clustering matrix-variate normal data with outliers is discussed. The approach, which uses the distribution of subset log-likelihoods, extends the OCLUST algorithm to matrix-variate normal data and uses an iterative approach to detect and trim outliers.  ( 2 min )
    Contextualized Policy Recovery: Modeling and Interpreting Medical Decisions with Adaptive Imitation Learning. (arXiv:2310.07918v1 [cs.LG])
    Interpretable policy learning seeks to estimate intelligible decision policies from observed actions; however, existing models fall short by forcing a tradeoff between accuracy and interpretability. This tradeoff limits data-driven interpretations of human decision-making process. e.g. to audit medical decisions for biases and suboptimal practices, we require models of decision processes which provide concise descriptions of complex behaviors. Fundamentally, existing approaches are burdened by this tradeoff because they represent the underlying decision process as a universal policy, when in fact human decisions are dynamic and can change drastically with contextual information. Thus, we propose Contextualized Policy Recovery (CPR), which re-frames the problem of modeling complex decision processes as a multi-task learning problem in which complex decision policies are comprised of context-specific policies. CPR models each context-specific policy as a linear observation-to-action mapping, and generates new decision models $\textit{on-demand}$ as contexts are updated with new observations. CPR is compatible with fully offline and partially observable decision environments, and can be tailored to incorporate any recurrent black-box model or interpretable decision model. We assess CPR through studies on simulated and real data, achieving state-of-the-art performance on the canonical tasks of predicting antibiotic prescription in intensive care units ($+22\%$ AUROC vs. previous SOTA) and predicting MRI prescription for Alzheimer's patients ($+7.7\%$ AUROC vs. previous SOTA). With this improvement in predictive performance, CPR closes the accuracy gap between interpretable and black-box methods for policy learning, allowing high-resolution exploration and analysis of context-specific decision models.  ( 3 min )
    Understanding Sparse Feature Updates in Deep Networks using Iterative Linearisation. (arXiv:2211.12345v4 [cs.LG] UPDATED)
    Larger and deeper networks generalise well despite their increased capacity to overfit. Understanding why this happens is theoretically and practically important. One recent approach looks at the infinitely wide limits of such networks and their corresponding kernels. However, these theoretical tools cannot fully explain finite networks as the empirical kernel changes significantly during gradient-descent-based training in contrast to infinite networks. In this work, we derive an iterative linearised training method as a novel empirical tool to further investigate this distinction, allowing us to control for sparse (i.e. infrequent) feature updates and quantify the frequency of feature learning needed to achieve comparable performance. We justify iterative linearisation as an interpolation between a finite analog of the infinite width regime, which does not learn features, and standard gradient descent training, which does. Informally, we also show that it is analogous to a damped version of the Gauss-Newton algorithm -- a second-order method. We show that in a variety of cases, iterative linearised training surprisingly performs on par with standard training, noting in particular how much less frequent feature learning is required to achieve comparable performance. We also show that feature learning is essential for good performance. Since such feature learning inevitably causes changes in the NTK kernel, we provide direct negative evidence for the NTK theory, which states the NTK kernel remains constant during training.  ( 3 min )
    Limits of Model Selection under Transfer Learning. (arXiv:2305.00152v4 [stat.ML] UPDATED)
    Theoretical studies on transfer learning or domain adaptation have so far focused on situations with a known hypothesis class or model; however in practice, some amount of model selection is usually involved, often appearing under the umbrella term of hyperparameter-tuning: for example, one may think of the problem of tuning for the right neural network architecture towards a target task, while leveraging data from a related source task. Now, in addition to the usual tradeoffs on approximation vs estimation errors involved in model selection, this problem brings in a new complexity term, namely, the transfer distance between source and target distributions, which is known to vary with the choice of hypothesis class. We present a first study of this problem, focusing on classification; in particular, the analysis reveals some remarkable phenomena: adaptive rates, i.e., those achievable with no distributional information, can be arbitrarily slower than oracle rates, i.e., when given knowledge on distances.  ( 2 min )
    A Theory of Non-Linear Feature Learning with One Gradient Step in Two-Layer Neural Networks. (arXiv:2310.07891v1 [stat.ML])
    Feature learning is thought to be one of the fundamental reasons for the success of deep neural networks. It is rigorously known that in two-layer fully-connected neural networks under certain conditions, one step of gradient descent on the first layer followed by ridge regression on the second layer can lead to feature learning; characterized by the appearance of a separated rank-one component -- spike -- in the spectrum of the feature matrix. However, with a constant gradient descent step size, this spike only carries information from the linear component of the target function and therefore learning non-linear components is impossible. We show that with a learning rate that grows with the sample size, such training in fact introduces multiple rank-one components, each corresponding to a specific polynomial feature. We further prove that the limiting large-dimensional and large sample training and test errors of the updated neural networks are fully characterized by these spikes. By precisely analyzing the improvement in the loss, we demonstrate that these non-linear features can enhance learning.  ( 2 min )
    Log-Gaussian Gamma Processes for Training Bayesian Neural Networks in Raman and CARS Spectroscopies. (arXiv:2310.08055v1 [stat.AP])
    We propose an approach utilizing gamma-distributed random variables, coupled with log-Gaussian modeling, to generate synthetic datasets suitable for training neural networks. This addresses the challenge of limited real observations in various applications. We apply this methodology to both Raman and coherent anti-Stokes Raman scattering (CARS) spectra, using experimental spectra to estimate gamma process parameters. Parameter estimation is performed using Markov chain Monte Carlo methods, yielding a full Bayesian posterior distribution for the model which can be sampled for synthetic data generation. Additionally, we model the additive and multiplicative background functions for Raman and CARS with Gaussian processes. We train two Bayesian neural networks to estimate parameters of the gamma process which can then be used to estimate the underlying Raman spectrum and simultaneously provide uncertainty through the estimation of parameters of a probability distribution. We apply the trained Bayesian neural networks to experimental Raman spectra of phthalocyanine blue, aniline black, naphthol red, and red 264 pigments and also to experimental CARS spectra of adenosine phosphate, fructose, glucose, and sucrose. The results agree with deterministic point estimates for the underlying Raman and CARS spectral signatures.  ( 2 min )
    Learning to Act from Actionless Videos through Dense Correspondences. (arXiv:2310.08576v1 [cs.RO])
    In this work, we present an approach to construct a video-based robot policy capable of reliably executing diverse tasks across different robots and environments from few video demonstrations without using any action annotations. Our method leverages images as a task-agnostic representation, encoding both the state and action information, and text as a general representation for specifying robot goals. By synthesizing videos that ``hallucinate'' robot executing actions and in combination with dense correspondences between frames, our approach can infer the closed-formed action to execute to an environment without the need of any explicit action labels. This unique capability allows us to train the policy solely based on RGB videos and deploy learned policies to various robotic tasks. We demonstrate the efficacy of our approach in learning policies on table-top manipulation and navigation tasks. Additionally, we contribute an open-source framework for efficient video modeling, enabling the training of high-fidelity policy models with four GPUs within a single day.  ( 2 min )
    A Complete Recipe for Diffusion Generative Models. (arXiv:2303.01748v2 [cs.LG] UPDATED)
    Score-based Generative Models (SGMs) have demonstrated exceptional synthesis outcomes across various tasks. However, the current design landscape of the forward diffusion process remains largely untapped and often relies on physical heuristics or simplifying assumptions. Utilizing insights from the development of scalable Bayesian posterior samplers, we present a complete recipe for formulating forward processes in SGMs, ensuring convergence to the desired target distribution. Our approach reveals that several existing SGMs can be seen as specific manifestations of our framework. Building upon this method, we introduce Phase Space Langevin Diffusion (PSLD), which relies on score-based modeling within an augmented space enriched by auxiliary variables akin to physical phase space. Empirical results exhibit the superior sample quality and improved speed-quality trade-off of PSLD compared to various competing approaches on established image synthesis benchmarks. Remarkably, PSLD achieves sample quality akin to state-of-the-art SGMs (FID: 2.10 for unconditional CIFAR-10 generation). Lastly, we demonstrate the applicability of PSLD in conditional synthesis using pre-trained score networks, offering an appealing alternative as an SGM backbone for future advancements. Code and model checkpoints can be accessed at \url{https://github.com/mandt-lab/PSLD}.  ( 2 min )
    Smoothed $f$-Divergence Distributionally Robust Optimization. (arXiv:2306.14041v2 [math.OC] UPDATED)
    In data-driven optimization, sample average approximation (SAA) is known to suffer from the so-called optimizer's curse that causes an over-optimistic evaluation of the solution performance. We argue that a special type of distributionallly robust optimization (DRO) formulation offers theoretical advantages in correcting for this optimizer's curse compared to simple ``margin'' adjustments to SAA and other DRO approaches: It attains a statistical bound on the out-of-sample performance, for a wide class of objective functions and distributions, that is nearly tightest in terms of exponential decay rate. This DRO uses an ambiguity set based on a Kullback Leibler (KL) divergence smoothed by the Wasserstein or L\'evy-Prokhorov (LP) distance via a suitable distance optimization. Computationally, we also show that such a DRO, and its generalized versions using smoothed $f$-divergence, are not harder than DRO problems based on $f$-divergence or Wasserstein distances, rendering our DRO formulations both statistically optimal and computationally viable.  ( 2 min )
    On Regularized Sparse Logistic Regression. (arXiv:2309.05925v2 [cs.LG] UPDATED)
    Sparse logistic regression is for classification and feature selection simultaneously. Although many studies have been done to solve $\ell_1$-regularized logistic regression, there is no equivalently abundant work on solving sparse logistic regression with nonconvex regularization term. In this paper, we propose a unified framework to solve $\ell_1$-regularized logistic regression, which can be naturally extended to nonconvex regularization term, as long as certain requirement is satisfied. In addition, we also utilize a different line search criteria to guarantee monotone convergence for various regularization terms. Empirical experiments on binary classification tasks with real-world datasets demonstrate our proposed algorithms are capable of performing classification and feature selection effectively at a lower computational cost.  ( 2 min )
    When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement. (arXiv:2310.07831v1 [cs.LG])
    Learning rate schedules used in practice bear little resemblance to those recommended by theory. We close much of this theory/practice gap, and as a consequence are able to derive new problem-adaptive learning rate schedules. Our key technical contribution is a refined analysis of learning rate schedules for a wide class of optimization algorithms (including SGD). In contrast to most prior works that study the convergence of the average iterate, we study the last iterate, which is what most people use in practice. When considering only worst-case analysis, our theory predicts that the best choice is the linear decay schedule: a popular choice in practice that sets the stepsize proportionally to $1 - t/T$, where $t$ is the current iteration and $T$ is the total number of steps. To go beyond this worst-case analysis, we use the observed gradient norms to derive schedules refined for any particular task. These refined schedules exhibit learning rate warm-up and rapid learning rate annealing near the end of training. Ours is the first systematic approach to automatically yield both of these properties. We perform the most comprehensive evaluation of learning rate schedules to date, evaluating across 10 diverse deep learning problems, a series of LLMs, and a suite of logistic regression problems. We validate that overall, the linear-decay schedule matches or outperforms all commonly used default schedules including cosine annealing, and that our schedule refinement method gives further improvements.  ( 3 min )
    Differentially Private Non-convex Learning for Multi-layer Neural Networks. (arXiv:2310.08425v1 [cs.LG])
    This paper focuses on the problem of Differentially Private Stochastic Optimization for (multi-layer) fully connected neural networks with a single output node. In the first part, we examine cases with no hidden nodes, specifically focusing on Generalized Linear Models (GLMs). We investigate the well-specific model where the random noise possesses a zero mean, and the link function is both bounded and Lipschitz continuous. We propose several algorithms and our analysis demonstrates the feasibility of achieving an excess population risk that remains invariant to the data dimension. We also delve into the scenario involving the ReLU link function, and our findings mirror those of the bounded link function. We conclude this section by contrasting well-specified and misspecified models, using ReLU regression as a representative example. In the second part of the paper, we extend our ideas to two-layer neural networks with sigmoid or ReLU activation functions in the well-specified model. In the third part, we study the theoretical guarantees of DP-SGD in Abadi et al. (2016) for fully connected multi-layer neural networks. By utilizing recent advances in Neural Tangent Kernel theory, we provide the first excess population risk when both the sample size and the width of the network are sufficiently large. Additionally, we discuss the role of some parameters in DP-SGD regarding their utility, both theoretically and empirically.  ( 2 min )
    Conditional Sig-Wasserstein GANs for Time Series Generation. (arXiv:2006.05421v2 [cs.LG] UPDATED)
    Generative adversarial networks (GANs) have been extremely successful in generating samples, from seemingly high dimensional probability measures. However, these methods struggle to capture the temporal dependence of joint probability distributions induced by time-series data. Furthermore, long time-series data streams hugely increase the dimension of the target space, which may render generative modelling infeasible. To overcome these challenges, motivated by the autoregressive models in econometric, we are interested in the conditional distribution of future time series given the past information. We propose the generic conditional Sig-WGAN framework by integrating Wasserstein-GANs (WGANs) with mathematically principled and efficient path feature extraction called the signature of a path. The signature of a path is a graded sequence of statistics that provides a universal description for a stream of data, and its expected value characterises the law of the time-series model. In particular, we develop the conditional Sig-$W_1$ metric, that captures the conditional joint law of time series models, and use it as a discriminator. The signature feature space enables the explicit representation of the proposed discriminators which alleviates the need for expensive training. We validate our method on both synthetic and empirical dataset and observe that our method consistently and significantly outperforms state-of-the-art benchmarks with respect to measures of similarity and predictive ability.  ( 3 min )
    An interpretable neural network-based non-proportional odds model for ordinal regression. (arXiv:2303.17823v3 [stat.ME] UPDATED)
    This study proposes an interpretable neural network-based non-proportional odds model (N$^3$POM) for ordinal regression. N$^3$POM is different from conventional approaches to ordinal regression with non-proportional models in several ways: (1) N$^3$POM is designed to directly handle continuous responses, whereas standard methods typically treat de facto ordered continuous variables as discrete, (2) instead of estimating response-dependent finite coefficients of linear models from discrete responses as is done in conventional approaches, we train a non-linear neural network to serve as a coefficient function. Thanks to the neural network, N$^3$POM offers flexibility while preserving the interpretability of conventional ordinal regression. We establish a sufficient condition under which the predicted conditional cumulative probability locally satisfies the monotonicity constraint over a user-specified region in the covariate space. Additionally, we provide a monotonicity-preserving stochastic (MPS) algorithm for effectively training the neural network. We apply N$^3$POM to several real-world datasets.  ( 2 min )
    Generalization bounds for neural ordinary differential equations and deep residual networks. (arXiv:2305.06648v2 [stat.ML] UPDATED)
    Neural ordinary differential equations (neural ODEs) are a popular family of continuous-depth deep learning models. In this work, we consider a large family of parameterized ODEs with continuous-in-time parameters, which include time-dependent neural ODEs. We derive a generalization bound for this class by a Lipschitz-based argument. By leveraging the analogy between neural ODEs and deep residual networks, our approach yields in particular a generalization bound for a class of deep residual networks. The bound involves the magnitude of the difference between successive weight matrices. We illustrate numerically how this quantity affects the generalization capability of neural networks.  ( 2 min )
    NECO: NEural Collapse Based Out-of-distribution detection. (arXiv:2310.06823v2 [stat.ML] UPDATED)
    Detecting out-of-distribution (OOD) data is a critical challenge in machine learning due to model overconfidence, often without awareness of their epistemological limits. We hypothesize that ``neural collapse'', a phenomenon affecting in-distribution data for models trained beyond loss convergence, also influences OOD data. To benefit from this interplay, we introduce NECO, a novel post-hoc method for OOD detection, which leverages the geometric properties of ``neural collapse'' and of principal component spaces to identify OOD data. Our extensive experiments demonstrate that NECO achieves state-of-the-art results on both small and large-scale OOD detection tasks while exhibiting strong generalization capabilities across different network architectures. Furthermore, we provide a theoretical explanation for the effectiveness of our method in OOD detection. We plan to release the code after the anonymity period.  ( 2 min )
    Variational Imbalanced Regression: Fair Uncertainty Quantification via Probabilistic Smoothing. (arXiv:2306.06599v4 [cs.LG] UPDATED)
    Existing regression models tend to fall short in both accuracy and uncertainty estimation when the label distribution is imbalanced. In this paper, we propose a probabilistic deep learning model, dubbed variational imbalanced regression (VIR), which not only performs well in imbalanced regression but naturally produces reasonable uncertainty estimation as a byproduct. Different from typical variational autoencoders assuming I.I.D. representations (a data point's representation is not directly affected by other data points), our VIR borrows data with similar regression labels to compute the latent representation's variational distribution; furthermore, different from deterministic regression models producing point estimates, VIR predicts the entire normal-inverse-gamma distributions and modulates the associated conjugate distributions to impose probabilistic reweighting on the imbalanced data, thereby providing better uncertainty estimation. Experiments in several real-world datasets show that our VIR can outperform state-of-the-art imbalanced regression models in terms of both accuracy and uncertainty estimation. Code will soon be available at \url{https://github.com/Wang-ML-Lab/variational-imbalanced-regression}.  ( 2 min )
    Hyperparameter Adaptive Search for Surrogate Optimization: A Self-Adjusting Approach. (arXiv:2310.07970v1 [cs.LG])
    Surrogate Optimization (SO) algorithms have shown promise for optimizing expensive black-box functions. However, their performance is heavily influenced by hyperparameters related to sampling and surrogate fitting, which poses a challenge to their widespread adoption. We investigate the impact of hyperparameters on various SO algorithms and propose a Hyperparameter Adaptive Search for SO (HASSO) approach. HASSO is not a hyperparameter tuning algorithm, but a generic self-adjusting SO algorithm that dynamically tunes its own hyperparameters while concurrently optimizing the primary objective function, without requiring additional evaluations. The aim is to improve the accessibility, effectiveness, and convergence speed of SO algorithms for practitioners. Our approach identifies and modifies the most influential hyperparameters specific to each problem and SO approach, reducing the need for manual tuning without significantly increasing the computational burden. Experimental results demonstrate the effectiveness of HASSO in enhancing the performance of various SO algorithms across different global optimization test problems.  ( 2 min )
    Robust 1-bit Compressed Sensing with Iterative Hard Thresholding. (arXiv:2310.08019v1 [cs.IT])
    In 1-bit compressed sensing, the aim is to estimate a $k$-sparse unit vector $x\in S^{n-1}$ within an $\epsilon$ error (in $\ell_2$) from minimal number of linear measurements that are quantized to just their signs, i.e., from measurements of the form $y = \mathrm{Sign}(\langle a, x\rangle).$ In this paper, we study a noisy version where a fraction of the measurements can be flipped, potentially by an adversary. In particular, we analyze the Binary Iterative Hard Thresholding (BIHT) algorithm, a proximal gradient descent on a properly defined loss function used for 1-bit compressed sensing, in this noisy setting. It is known from recent results that, with $\tilde{O}(\frac{k}{\epsilon})$ noiseless measurements, BIHT provides an estimate within $\epsilon$ error. This result is optimal and universal, meaning one set of measurements work for all sparse vectors. In this paper, we show that BIHT also provides better results than all known methods for the noisy setting. We show that when up to $\tau$-fraction of the sign measurements are incorrect (adversarial error), with the same number of measurements as before, BIHT agnostically provides an estimate of $x$ within an $\tilde{O}(\epsilon+\tau)$ error, maintaining the universality of measurements. This establishes stability of iterative hard thresholding in the presence of measurement error. To obtain the result, we use the restricted approximate invertibility of Gaussian matrices, as well as a tight analysis of the high-dimensional geometry of the adversarially corrupted measurements.  ( 3 min )
    Efficient probabilistic reconciliation of forecasts for real-valued and count time series. (arXiv:2210.02286v3 [stat.ML] UPDATED)
    Hierarchical time series are common in several applied fields. The forecasts for these time series are required to be coherent, that is, to satisfy the constraints given by the hierarchy. The most popular technique to enforce coherence is called reconciliation, which adjusts the base forecasts computed for each time series. However, recent works on probabilistic reconciliation present several limitations. In this paper, we propose a new approach based on conditioning to reconcile any type of forecast distribution. We then introduce a new algorithm, called Bottom-Up Importance Sampling, to efficiently sample from the reconciled distribution. It can be used for any base forecast distribution: discrete, continuous, or in the form of samples, providing a major speedup compared to the current methods. Experiments on several temporal hierarchies show a significant improvement over base probabilistic forecasts.  ( 2 min )
    Memorization with neural nets: going beyond the worst case. (arXiv:2310.00327v2 [stat.ML] UPDATED)
    In practice, deep neural networks are often able to easily interpolate their training data. To understand this phenomenon, many works have aimed to quantify the memorization capacity of a neural network architecture: the largest number of points such that the architecture can interpolate any placement of these points with any assignment of labels. For real-world data, however, one intuitively expects the presence of a benign structure so that interpolation already occurs at a smaller network size than suggested by memorization capacity. In this paper, we investigate interpolation by adopting an instance-specific viewpoint. We introduce a simple randomized algorithm that, given a fixed finite dataset with two classes, with high probability constructs an interpolating three-layer neural network in polynomial time. The required number of parameters is linked to geometric properties of the two classes and their mutual arrangement. As a result, we obtain guarantees that are independent of the number of samples and hence move beyond worst-case memorization capacity bounds. We illustrate the effectiveness of the algorithm in non-pathological situations with extensive numerical experiments and link the insights back to the theoretical results.  ( 2 min )
    Model-Agnostic Covariate-Assisted Inference on Partially Identified Causal Effects. (arXiv:2310.08115v1 [econ.EM])
    Many causal estimands are only partially identifiable since they depend on the unobservable joint distribution between potential outcomes. Stratification on pretreatment covariates can yield sharper partial identification bounds; however, unless the covariates are discrete with relatively small support, this approach typically requires consistent estimation of the conditional distributions of the potential outcomes given the covariates. Thus, existing approaches may fail under model misspecification or if consistency assumptions are violated. In this study, we propose a unified and model-agnostic inferential approach for a wide class of partially identified estimands, based on duality theory for optimal transport problems. In randomized experiments, our approach can wrap around any estimates of the conditional distributions and provide uniformly valid inference, even if the initial estimates are arbitrarily inaccurate. Also, our approach is doubly robust in observational studies. Notably, this property allows analysts to use the multiplier bootstrap to select covariates and models without sacrificing validity even if the true model is not included. Furthermore, if the conditional distributions are estimated at semiparametric rates, our approach matches the performance of an oracle with perfect knowledge of the outcome model. Finally, we propose an efficient computational framework, enabling implementation on many practical problems in causal inference.  ( 2 min )
    Generative modeling of time-dependent densities via optimal transport and projection pursuit. (arXiv:2304.09663v2 [stat.ML] UPDATED)
    Motivated by the computational difficulties incurred by popular deep learning algorithms for the generative modeling of temporal densities, we propose a cheap alternative which requires minimal hyperparameter tuning and scales favorably to high dimensional problems. In particular, we use a projection-based optimal transport solver [Meng et al., 2019] to join successive samples and subsequently use transport splines [Chewi et al., 2020] to interpolate the evolving density. When the sampling frequency is sufficiently high, the optimal maps are close to the identity and are thus computationally efficient to compute. Moreover, the training process is highly parallelizable as all optimal maps are independent and can thus be learned simultaneously. Finally, the approach is based solely on numerical linear algebra rather than minimizing a nonconvex objective function, allowing us to easily analyze and control the algorithm. We present several numerical experiments on both synthetic and real-world datasets to demonstrate the efficiency of our method. In particular, these experiments show that the proposed approach is highly competitive compared with state-of-the-art normalizing flows conditioned on time across a wide range of dimensionalities.  ( 3 min )
    Online RL in Linearly $q^\pi$-Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore. (arXiv:2310.07811v1 [cs.LG])
    We consider online reinforcement learning (RL) in episodic Markov decision processes (MDPs) under the linear $q^\pi$-realizability assumption, where it is assumed that the action-values of all policies can be expressed as linear functions of state-action features. This class is known to be more general than linear MDPs, where the transition kernel and the reward function are assumed to be linear functions of the feature vectors. As our first contribution, we show that the difference between the two classes is the presence of states in linearly $q^\pi$-realizable MDPs where for any policy, all the actions have approximately equal values, and skipping over these states by following an arbitrarily fixed policy in those states transforms the problem to a linear MDP. Based on this observation, we derive a novel (computationally inefficient) learning algorithm for linearly $q^\pi$-realizable MDPs that simultaneously learns what states should be skipped over and runs another learning algorithm on the linear MDP hidden in the problem. The method returns an $\epsilon$-optimal policy after $\text{polylog}(H, d)/\epsilon^2$ interactions with the MDP, where $H$ is the time horizon and $d$ is the dimension of the feature vectors, giving the first polynomial-sample-complexity online RL algorithm for this setting. The results are proved for the misspecified case, where the sample complexity is shown to degrade gracefully with the misspecification error.  ( 3 min )
    Conformal inference for regression on Riemannian Manifolds. (arXiv:2310.08209v1 [stat.ML])
    Regression on manifolds, and, more broadly, statistics on manifolds, has garnered significant importance in recent years due to the vast number of applications for this type of data. Circular data is a classic example, but so is data in the space of covariance matrices, data on the Grassmannian manifold obtained as a result of principal component analysis, among many others. In this work we investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by X, lies in Euclidean space. This extends the concepts delineated in [Lei and Wasserman, 2014] to this novel context. Aligning with traditional principles in conformal inference, these prediction sets are distribution-free, indicating that no specific assumptions are imposed on the joint distribution of $(X, Y)$, and they maintain a non-parametric character. We prove the asymptotic almost sure convergence of the empirical version of these regions on the manifold to their population counterparts. The efficiency of this method is shown through a comprehensive simulation study and an analysis involving real-world data.  ( 2 min )
    On Extreme Value Asymptotics of Projected Sample Covariances in High Dimensions with Applications in Finance and Convolutional Networks. (arXiv:2310.08150v1 [math.ST])
    Maximum-type statistics of certain functions of the sample covariance matrix of high-dimensional vector time series are studied to statistically confirm or reject the null hypothesis that a data set has been collected under normal conditions. The approach generalizes the case of the maximal deviation of the sample autocovariances function from its assumed values. Within a linear time series framework it is shown that Gumbel-type extreme value asymptotics holds true. As applications we discuss long-only mimimal-variance portfolio optimization and subportfolio analysis with respect to idiosyncratic risks, ETF index tracking by sparse tracking portfolios, convolutional deep learners for image analysis and the analysis of array-of-sensors data.  ( 2 min )
    Local Graph Clustering with Noisy Labels. (arXiv:2310.08031v1 [cs.LG])
    The growing interest in machine learning problems over graphs with additional node information such as texts, images, or labels has popularized methods that require the costly operation of processing the entire graph. Yet, little effort has been made to the development of fast local methods (i.e. without accessing the entire graph) that extract useful information from such data. To that end, we propose a study of local graph clustering using noisy node labels as a proxy for additional node information. In this setting, nodes receive initial binary labels based on cluster affiliation: 1 if they belong to the target cluster and 0 otherwise. Subsequently, a fraction of these labels is flipped. We investigate the benefits of incorporating noisy labels for local graph clustering. By constructing a weighted graph with such labels, we study the performance of graph diffusion-based local clustering method on both the original and the weighted graphs. From a theoretical perspective, we consider recovering an unknown target cluster with a single seed node in a random graph with independent noisy node labels. We provide sufficient conditions on the label noise under which, with high probability, using diffusion in the weighted graph yields a more accurate recovery of the target cluster. This approach proves more effective than using the given labels alone or using diffusion in the label-free original graph. Empirically, we show that reliable node labels can be obtained with just a few samples from an attributed graph. Moreover, utilizing these labels via diffusion in the weighted graph leads to significantly better local clustering performance across several real-world datasets, improving F1 scores by up to 13%.  ( 3 min )
    Evaluation of ChatGPT-Generated Medical Responses: A Systematic Review and Meta-Analysis. (arXiv:2310.08410v1 [stat.ME])
    Large language models such as ChatGPT are increasingly explored in medical domains. However, the absence of standard guidelines for performance evaluation has led to methodological inconsistencies. This study aims to summarize the available evidence on evaluating ChatGPT's performance in medicine and provide direction for future research. We searched ten medical literature databases on June 15, 2023, using the keyword "ChatGPT". A total of 3520 articles were identified, of which 60 were reviewed and summarized in this paper and 17 were included in the meta-analysis. The analysis showed that ChatGPT displayed an overall integrated accuracy of 56% (95% CI: 51%-60%, I2 = 87%) in addressing medical queries. However, the studies varied in question resource, question-asking process, and evaluation metrics. Moreover, many studies failed to report methodological details, including the version of ChatGPT and whether each question was used independently or repeatedly. Our findings revealed that although ChatGPT demonstrated considerable potential for application in healthcare, the heterogeneity of the studies and insufficient reporting may affect the reliability of these results. Further well-designed studies with comprehensive and transparent reporting are needed to evaluate ChatGPT's performance in medicine.  ( 2 min )
    Extensions of Heterogeneity in Integration and Prediction (HIP) with R Shiny Application. (arXiv:2310.08426v1 [stat.ME])
    Multiple data views measured on the same set of participants is becoming more common and has the potential to deepen our understanding of many complex diseases by analyzing these different views simultaneously. Equally important, many of these complex diseases show evidence of subgroup heterogeneity (e.g., by sex or race). HIP (Heterogeneity in Integration and Prediction) is among the first methods proposed to integrate multiple data views while also accounting for subgroup heterogeneity to identify common and subgroup-specific markers of a particular disease. However, HIP is applicable to continuous outcomes and requires programming expertise by the user. Here we propose extensions to HIP that accommodate multi-class, Poisson, and Zero-Inflated Poisson outcomes while retaining the benefits of HIP. Additionally, we introduce an R Shiny application, accessible on shinyapps.io at https://multi-viewlearn.shinyapps.io/HIP_ShinyApp/, that provides an interface with the Python implementation of HIP to allow more researchers to use the method anywhere and on any device. We applied HIP to identify genes and proteins common and specific to males and females that are associated with exacerbation frequency. Although some of the identified genes and proteins show evidence of a relationship with chronic obstructive pulmonary disease (COPD) in existing literature, others may be candidates for future research investigating their relationship with COPD. We demonstrate the use of the Shiny application with a publicly available data. An R-package for HIP would be made available at https://github.com/lasandrall/HIP.  ( 3 min )
    Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts. (arXiv:2310.05898v2 [cs.LG] UPDATED)
    Lion (Evolved Sign Momentum), a new optimizer discovered through program search, has shown promising results in training large AI models. It performs comparably or favorably to AdamW but with greater memory efficiency. As we can expect from the results of a random search program, Lion incorporates elements from several existing algorithms, including signed momentum, decoupled weight decay, Polak, and Nesterov momentum, but does not fit into any existing category of theoretically grounded optimizers. Thus, even though Lion appears to perform well as a general-purpose optimizer for a wide range of tasks, its theoretical basis remains uncertain. This lack of theoretical clarity limits opportunities to further enhance and expand Lion's efficacy. This work aims to demystify Lion. Based on both continuous-time and discrete-time analysis, we demonstrate that Lion is a theoretically novel and principled approach for minimizing a general loss function $f(x)$ while enforcing a bound constraint $\|x\|_\infty \leq 1/\lambda$. Lion achieves this through the incorporation of decoupled weight decay, where $\lambda$ represents the weight decay coefficient. Our analysis is made possible by the development of a new Lyapunov function for the Lion updates. It applies to a broader family of Lion-$\kappa$ algorithms, where the $\text{sign}(\cdot)$ operator in Lion is replaced by the subgradient of a convex function $\kappa$, leading to the solution of a general composite optimization problem of $\min_x f(x) + \kappa^*(x)$. Our findings provide valuable insights into the dynamics of Lion and pave the way for further improvements and extensions of Lion-related algorithms.  ( 3 min )
    Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining. (arXiv:2310.08566v1 [cs.LG])
    Large transformer models pretrained on offline reinforcement learning datasets have demonstrated remarkable in-context reinforcement learning (ICRL) capabilities, where they can make good decisions when prompted with interaction trajectories from unseen environments. However, when and how transformers can be trained to perform ICRL have not been theoretically well-understood. In particular, it is unclear which reinforcement-learning algorithms transformers can perform in context, and how distribution mismatch in offline training data affects the learned algorithms. This paper provides a theoretical framework that analyzes supervised pretraining for ICRL. This includes two recently proposed training methods -- algorithm distillation and decision-pretrained transformers. First, assuming model realizability, we prove the supervised-pretrained transformer will imitate the conditional expectation of the expert algorithm given the observed trajectory. The generalization error will scale with model capacity and a distribution divergence factor between the expert and offline algorithms. Second, we show transformers with ReLU attention can efficiently approximate near-optimal online reinforcement learning algorithms like LinUCB and Thompson sampling for stochastic linear bandits, and UCB-VI for tabular Markov decision processes. This provides the first quantitative analysis of the ICRL capabilities of transformers pretrained from offline trajectories.  ( 2 min )
    Variable Selection for Kernel Two-Sample Tests. (arXiv:2302.07415v3 [stat.ML] UPDATED)
    We consider the variable selection problem for two-sample tests, aiming to select the most informative variables to distinguish samples from two groups. To solve this problem, we propose a framework based on the kernel maximum mean discrepancy (MMD). Our approach seeks a group of variables with a pre-specified size that maximizes the variance-regularized MMD statistics. This formulation also corresponds to the minimization of asymptotic type-II error while controlling type-I error, as studied in the literature. We present mixed-integer programming formulations and develop exact and approximation algorithms with performance guarantees for different choices of kernel functions. Furthermore, we provide a statistical testing power analysis of our proposed framework. Experiment results on synthetic and real datasets demonstrate the superior performance of our approach.  ( 2 min )
    Lattice real-time simulations with learned optimal kernels. (arXiv:2310.08053v1 [hep-lat])
    We present a simulation strategy for the real-time dynamics of quantum fields, inspired by reinforcement learning. It builds on the complex Langevin approach, which it amends with system specific prior information, a necessary prerequisite to overcome this exceptionally severe sign problem. The optimization process underlying our machine learning approach is made possible by deploying inherently stable solvers of the complex Langevin stochastic process and a novel optimality criterion derived from insight into so-called boundary terms. This conceptual and technical progress allows us to both significantly extend the range of real-time simulations in 1+1d scalar field theory beyond the state-of-the-art and to avoid discretization artifacts that plagued previous real-time field theory simulations. Limitations of and promising future directions are discussed.  ( 2 min )
    On the Computational Complexity of Private High-dimensional Model Selection via the Exponential Mechanism. (arXiv:2310.07852v1 [stat.ML])
    We consider the problem of model selection in a high-dimensional sparse linear regression model under the differential privacy framework. In particular, we consider the problem of differentially private best subset selection and study its utility guarantee. We adopt the well-known exponential mechanism for selecting the best model, and under a certain margin condition, we establish its strong model recovery property. However, the exponential search space of the exponential mechanism poses a serious computational bottleneck. To overcome this challenge, we propose a Metropolis-Hastings algorithm for the sampling step and establish its polynomial mixing time to its stationary distribution in the problem parameters $n,p$, and $s$. Furthermore, we also establish approximate differential privacy for the final estimates of the Metropolis-Hastings random walk using its mixing property. Finally, we also perform some illustrative simulations that echo the theoretical findings of our main results.  ( 2 min )
    Learning Regularized Monotone Graphon Mean-Field Games. (arXiv:2310.08089v1 [cs.GT])
    This paper studies two fundamental problems in regularized Graphon Mean-Field Games (GMFGs). First, we establish the existence of a Nash Equilibrium (NE) of any $\lambda$-regularized GMFG (for $\lambda\geq 0$). This result relies on weaker conditions than those in previous works for analyzing both unregularized GMFGs ($\lambda=0$) and $\lambda$-regularized MFGs, which are special cases of GMFGs. Second, we propose provably efficient algorithms to learn the NE in weakly monotone GMFGs, motivated by Lasry and Lions [2007]. Previous literature either only analyzed continuous-time algorithms or required extra conditions to analyze discrete-time algorithms. In contrast, we design a discrete-time algorithm and derive its convergence rate solely under weakly monotone conditions. Furthermore, we develop and analyze the action-value function estimation procedure during the online learning process, which is absent from algorithms for monotone GMFGs. This serves as a sub-module in our optimization algorithm. The efficiency of the designed algorithm is corroborated by empirical evaluations.  ( 2 min )
    Efficient Integrators for Diffusion Generative Models. (arXiv:2310.07894v1 [cs.LG])
    Diffusion models suffer from slow sample generation at inference time. Therefore, developing a principled framework for fast deterministic/stochastic sampling for a broader class of diffusion models is a promising direction. We propose two complementary frameworks for accelerating sample generation in pre-trained models: Conjugate Integrators and Splitting Integrators. Conjugate integrators generalize DDIM, mapping the reverse diffusion dynamics to a more amenable space for sampling. In contrast, splitting-based integrators, commonly used in molecular dynamics, reduce the numerical simulation error by cleverly alternating between numerical updates involving the data and auxiliary variables. After extensively studying these methods empirically and theoretically, we present a hybrid method that leads to the best-reported performance for diffusion models in augmented spaces. Applied to Phase Space Langevin Diffusion [Pandey & Mandt, 2023] on CIFAR-10, our deterministic and stochastic samplers achieve FID scores of 2.11 and 2.36 in only 100 network function evaluations (NFE) as compared to 2.57 and 2.63 for the best-performing baselines, respectively. Our code and model checkpoints will be made publicly available at \url{https://github.com/mandt-lab/PSLD}.  ( 2 min )
    How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?. (arXiv:2310.08391v1 [stat.ML])
    Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities, enabling them to solve unseen tasks solely based on input contexts without adjusting model parameters. In this paper, we study ICL in one of its simplest setups: pretraining a linearly parameterized single-layer linear attention model for linear regression with a Gaussian prior. We establish a statistical task complexity bound for the attention model pretraining, showing that effective pretraining only requires a small number of independent tasks. Furthermore, we prove that the pretrained model closely matches the Bayes optimal algorithm, i.e., optimally tuned ridge regression, by achieving nearly Bayes optimal risk on unseen tasks under a fixed context length. These theoretical findings complement prior experimental research and shed light on the statistical foundations of ICL.  ( 2 min )
    Personalised dynamic super learning: an application in predicting hemodiafiltration's convection volumes. (arXiv:2310.08479v1 [stat.ME])
    Obtaining continuously updated predictions is a major challenge for personalised medicine. Leveraging combinations of parametric regressions and machine learning approaches, the personalised online super learner (POSL) can achieve such dynamic and personalised predictions. We adapt POSL to predict a repeated continuous outcome dynamically and propose a new way to validate such personalised or dynamic prediction models. We illustrate its performance by predicting the convection volume of patients undergoing hemodiafiltration. POSL outperformed its candidate learners with respect to median absolute error, calibration-in-the-large, discrimination, and net benefit. We finally discuss the choices and challenges underlying the use of POSL.  ( 2 min )
    L2P: Learning to Place for Estimating Heavy-Tailed Distributed Outcomes. (arXiv:1908.04628v3 [cs.LG] UPDATED)
    Many real-world prediction tasks have outcome variables that have characteristic heavy-tail distributions. Examples include copies of books sold, auction prices of art pieces, demand for commodities in warehouses, etc. By learning heavy-tailed distributions, "big and rare" instances (e.g., the best-sellers) will have accurate predictions. Most existing approaches are not dedicated to learning heavy-tailed distribution; thus, they heavily under-predict such instances. To tackle this problem, we introduce Learning to Place (L2P), which exploits the pairwise relationships between instances for learning. In its training phase, L2P learns a pairwise preference classifier: is instance A > instance B? In its placing phase, L2P obtains a prediction by placing the new instance among the known instances. Based on its placement, the new instance is then assigned a value for its outcome variable. Experiments on real data show that L2P outperforms competing approaches in terms of accuracy and ability to reproduce heavy-tailed outcome distribution. In addition, L2P provides an interpretable model by placing each predicted instance in relation to its comparable neighbors. Interpretable models are highly desirable when lives and treasure are at stake.  ( 3 min )
    Statistical Performance Guarantee for Selecting Those Predicted to Benefit Most from Treatment. (arXiv:2310.07973v1 [stat.ME])
    Across a wide array of disciplines, many researchers use machine learning (ML) algorithms to identify a subgroup of individuals, called exceptional responders, who are likely to be helped by a treatment the most. A common approach consists of two steps. One first estimates the conditional average treatment effect or its proxy using an ML algorithm. They then determine the cutoff of the resulting treatment prioritization score to select those predicted to benefit most from the treatment. Unfortunately, these estimated treatment prioritization scores are often biased and noisy. Furthermore, utilizing the same data to both choose a cutoff value and estimate the average treatment effect among the selected individuals suffer from a multiple testing problem. To address these challenges, we develop a uniform confidence band for experimentally evaluating the sorted average treatment effect (GATES) among the individuals whose treatment prioritization score is at least as high as any given quantile value, regardless of how the quantile is chosen. This provides a statistical guarantee that the GATES for the selected subgroup exceeds a certain threshold. The validity of the proposed methodology depends solely on randomization of treatment and random sampling of units without requiring modeling assumptions or resampling methods. This widens its applicability including a wide range of other causal quantities. A simulation study shows that the empirical coverage of the proposed uniform confidence bands is close to the nominal coverage when the sample is as small as 100. We analyze a clinical trial of late-stage prostate cancer and find a relatively large proportion of exceptional responders with a statistical performance guarantee.  ( 3 min )
    Towards a Unified Analysis of Kernel-based Methods Under Covariate Shift. (arXiv:2310.08237v1 [stat.ML])
    Covariate shift occurs prevalently in practice, where the input distributions of the source and target data are substantially different. Despite its practical importance in various learning problems, most of the existing methods only focus on some specific learning tasks and are not well validated theoretically and numerically. To tackle this problem, we propose a unified analysis of general nonparametric methods in a reproducing kernel Hilbert space (RKHS) under covariate shift. Our theoretical results are established for a general loss belonging to a rich loss function family, which includes many commonly used methods as special cases, such as mean regression, quantile regression, likelihood-based classification, and margin-based classification. Two types of covariate shift problems are the focus of this paper and the sharp convergence rates are established for a general loss function to provide a unified theoretical analysis, which concurs with the optimal results in literature where the squared loss is used. Extensive numerical studies on synthetic and real examples confirm our theoretical findings and further illustrate the effectiveness of our proposed method.  ( 2 min )
    Towards the Fundamental Limits of Knowledge Transfer over Finite Domains. (arXiv:2310.07838v1 [cs.LG])
    We characterize the statistical efficiency of knowledge transfer through $n$ samples from a teacher to a probabilistic student classifier with input space $\mathcal S$ over labels $\mathcal A$. We show that privileged information at three progressive levels accelerates the transfer. At the first level, only samples with hard labels are known, via which the maximum likelihood estimator attains the minimax rate $\sqrt{{|{\mathcal S}||{\mathcal A}|}/{n}}$. The second level has the teacher probabilities of sampled labels available in addition, which turns out to boost the convergence rate lower bound to ${{|{\mathcal S}||{\mathcal A}|}/{n}}$. However, under this second data acquisition protocol, minimizing a naive adaptation of the cross-entropy loss results in an asymptotically biased student. We overcome this limitation and achieve the fundamental limit by using a novel empirical variant of the squared error logit loss. The third level further equips the student with the soft labels (complete logits) on ${\mathcal A}$ given every sampled input, thereby provably enables the student to enjoy a rate ${|{\mathcal S}|}/{n}$ free of $|{\mathcal A}|$. We find any Kullback-Leibler divergence minimizer to be optimal in the last case. Numerical simulations distinguish the four learners and corroborate our theory.  ( 2 min )

  • Open

    Savage Dall-e 3 delivers "Average reddit post"
    submitted by /u/Zimmax [link] [comments]
    AI — weekly megathread!
    News provided by aibrews.com Researchers present LLark: A Multimodal Foundation Model for Music - an open-source instruction-tuned multimodal model for music understanding. LLark is trained entirely from open-source music data and models [Demo | Paper] Researchers released LLaVA-1.5. LLaVA (Large Language and Vision Assistant) is an open-source large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. LLaVA-1.5 achieved SoTA on 11 benchmarks, with just simple modifications to the original LLaVA and completed training in ~1 day on a single 8-A100 node [Demo | Paper | GitHub]. Voice AI platform ElevenLabs released AI Dubbing tool that enables users to automatically translate any audio in a video into a different language whil…
    The AI Boom Could Use a Shocking Amount of Electricity
    The rapid growth of artificial intelligence (AI) could lead to a significant increase in global electricity consumption, according to a peer-reviewed analysis published in Joule. The analysis estimates that if current trends continue, AI could drive the demand for electricity in data centers to consume at least 85.4 terawatt-hours annually, which is more than what many small countries use in a year. AI is energy-intensive, with both the training and inference phases requiring a significant amount of energy. The size of AI models, such as large language models, and the location of data centers also contribute to energy usage. Factors such as cooling requirements and the type of hardware used can impact energy consumption. Source : https://www.scientificamerican.com/article/the-ai-boom-could-use-a-shocking-amount-of-electricity/ submitted by /u/NuseAI [link] [comments]
    Lemur: Harmonizing Natural Language and Code for Language Agents
    Today's conversational bots like Claude and GPT can chat impressively but aren't great at complex planning or executing technical tasks. To overcome this, new research from HKU builds open-source AI agents that blend natural language and coding skills. They're called Lemur and Lemur-Chat. The researchers think achieving versatile real-world agents requires models that integrate both fluid natural language abilities and precise programming language control. Humans combine plain speech for higher-level goals with languages like Python when we need to plan intricately and execute exactly. AI needs both capacities too. But most existing models specialize in pure language or pure code. There's a separation that is limiting. The team created Lemur by pretraining the open-source Llama-2 on a massive mixed corpus with 10x more natural language than code. This improved its programming abilities while retaining conversational strength. Further instruction tuning optimized Lemur-Chat for following free-form directions in language. Experiments found Lemur surpassed specialized coding-only models like Codex in overall benchmarks. Lemur-Chat then exceeded Lemur by 15% after instruction tuning. More importantly, Lemur-Chat won 12/13 new "agent tests" designed to mimic real-world challenges needing both language and programming prowess. It beat alternatives at: Using tools like Python and Wikipedia to enhance reasoning Debugging code by leveraging error messages Improving the most from natural language feedback Exploring partially observable environments like cybersecurity and web browsing simulations. Lemur-Chat matched GPT-3.5 in many tests, closing the gap between commercial and open-source agents. TLDR: New open-source AI agents combine coding and language skills. Experiments show the combo unlocks more performance across technical challenges. Full summary is here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]
    Henry Kissinger: The Path to AI Arms Control
    submitted by /u/ForeignAffairsMag [link] [comments]
    A 21-year-old won $40,000 for using AI to read the first word on a 2,000-year-old papyrus scroll buried by Mount Vesuvius
    submitted by /u/thisisinsider [link] [comments]
    "Special Announcement: John Carmack & Rich Sutton partner to accelerate development of AGI" | "Carmack and Sutton are deeply focused on developing a genuine AI prototype by 2030, including establishing, advancing, and documenting AGI signs of life"
    submitted by /u/Tao_Dragon [link] [comments]
    Dumbing down or wising up: how will generative AI change the way we think?
    submitted by /u/Jariiari7 [link] [comments]
    One-Minute Daily AI News 10/13/2023
    In a recent article published in the journal Nature, researchers developed AI Tool EVEscape, a tool to forecast which severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) strains have the highest potential to escape host immunity.[1] Microsoft seems to be working on the possible development of an artificial intelligence (AI) system that can understand and resolve customer support requests using natural language processing.[2] Google’s Search Generative Experience (SGE) will let you create images right from a text prompt starting Thursday.[3] The Biden administration is considering closing a loophole that gives Chinese companies access to American artificial intelligence (AI) chips through units located overseas, according to four people familiar with the matter.[4] Sources: [1] https://www.news-medical.net/news/20231012/EVScape-New-tool-to-forecast-which-SARS-CoV-2-variants-could-dodge-our-immunity.aspx [2] https://winbuzzer.com/2023/10/11/microsoft-gears-up-for-a-revolutionary-natural-language-customer-support-ai-xcxwbn/ [3] https://www.theverge.com/2023/10/12/23913337/google-ai-powered-search-sge-images-written-drafts [4] https://www.reuters.com/technology/biden-eyes-adding-ai-chip-curbs-chinese-companies-abroad-2023-10-13/ submitted by /u/Excellent-Target-847 [link] [comments]
    I’ve created a audiobook generator anyone got any books to test on it? Each character is given a different voice.
    Also if anyone has anyone who should be a voice actor included in it it can also clone voices. Idk I need to make sure it works for a wide variety of books. As long as they don’t use ‘ for quotes cause the computer getts that confused when “ I’ve “ and such uses the same symbol submitted by /u/Impossible_Belt_7757 [link] [comments]
    Check out the latest episode of my history podcast on the future of A.I.!
    submitted by /u/ErikSlader713 [link] [comments]
    Drew a picture in paint, threw it in hotpot, and it came out a stylish, halloweenish picure. Damn this stuff is amazing.
    submitted by /u/kipaxbooks [link] [comments]
  • Open

    [P] App for iOS and M1 macOS for image bounding box annotation
    ClassifyML is an application for creating specialised image datasets for use with an ML training algorithm. Simply import your chosen images into the app via file manager, drag'n'drop or the on device camera and create your bounding boxes and then export your images and JSON into a structured folder. LINK: https://apps.apple.com/app/classify-ml/id6461013113 https://preview.redd.it/dicsq9d3k1ub1.png?width=313&format=png&auto=webp&s=7976a61f599c658d948dec12db0b8ec93274ad93 https://preview.redd.it/3tswxdd3k1ub1.png?width=313&format=png&auto=webp&s=56ca30546984402f4dbba628b73732918e921758 https://preview.redd.it/y0xelmz3k1ub1.png?width=313&format=png&auto=webp&s=a755ea61bc247c6aacb61a31c700e4e80a1ed69f submitted by /u/LiamRogers99 [link] [comments]  ( 9 min )
    [D] What are the best resources for learning reinforcement learning?
    Recently I came across Open AI's Spinning Up Project, which seems to be well structured, but quite introductory. What are some resources you use for learning RL? submitted by /u/OwnAd9305 [link] [comments]  ( 9 min )
    [D] LLM for entity/scene recognition in a book?
    Hello, I'm looking for an open source LLM that can extract all the characters from an inputted book, and isolate passages with descriptive writing that involves imagery. Can anyone suggest me something? Thanks! submitted by /u/slomorosh [link] [comments]  ( 9 min )
    [P] Deploy and Run LLMs at the Edge: Use Code Llama to Generate a Dashboard in a Network Restricted Environment
    In this blog, we explore different definitions of “the edge,” and understand the factors driving AI/ML to the edge. We examine why the trends of LLMs and edge computing are intersecting now, and how teams can take advantage of their combined power today. We also demonstrate how LLMs can be used in an edge environment to generate insights for a real-world use case today. Consider a geologist working in a remote oil field who is responsible for building and analyzing 3D models of oil fields to determine production capacity and the impact on profitability. In this demo, we walk through how Code Llama, Chassisml.io, and Modzy could be used to build a dashboard that geologists could use to analyze well data in real-time in a remote, network restricted environment, allowing for LLM insights generated at the edge. Learn more: https://www.modzy.com/modzy-blog/deploy-and-run-llms-at-the-edge submitted by /u/modzykirsten [link] [comments]  ( 9 min )
    [D] ICLR submissions are out. Discussion thread
    https://openreview.net/group?id=ICLR.cc/2024/Conference submitted by /u/_puhsu [link] [comments]  ( 8 min )
    [D] Vscode issue
    I am running AutoTokenizer from transformers on vscode. The vscode crashes showing error and not responding. I don't understand what's wrong. submitted by /u/ArtichokeOne5897 [link] [comments]  ( 8 min )
    "[P]" Utilizing Machine Learning Techniques for Document Digitalization Project
    Hey Guys, ​ I am currently spearheading a project for a client in the insurance industry, with a primary objective being the digitalization of thousands of hardcopy contracts. The ultimate goal is to automatically extract particular information from these newly digital documents, namely "date", "insurance premium", "insurance type", and "contractor's name". However, I anticipate a level of variability in terms of exact terminology used, particularly with regards to "insurance premium" and "insurance type". (There is no handwritten text) ​ I am keen on sharing the methodology I intend to apply for this project and invite your invaluable feedback and suggestions: ​ - Firstly, I'll execute the scanning/digitalization of the documents manually. - Post this, I plan to utilize Tesseract in combination with Python for the extraction of text from the preprocessed images. - I am considering using libraries such as NLTK or spaCy to preprocess this text (this will involve steps like lower casing, removing punctuations, etc.) - Finally, I plan to train a custom model for Named Entity Recognition (NER), to accommodate the potential semantic variations in entity labeling which are specific to entities like "insurance premium" and "insurance type". ​ I would be immensely grateful if I could gain your insights on the above-proposed pipeline - Are there any glaring pitfalls I need to avoid or perhaps some improvements that I could incorporate? Your expert advice can certainly help ensure the success of this venture. ​ Many thanks in anticipation for your time and valuable inputs! submitted by /u/Background_Thanks604 [link] [comments]  ( 9 min )
    [News] AI & ML conference in San Francisco [Special discount code for this subreddit]
    I work for this database company SingleStore and we are hosting a AI & ML conference in San Francisco on 17th of October, 2023. It is an in-person conference with amazing speakers line-up like Harrison Chase, co-founder and CEO of LangChain and many more. We will have hands-on workshops, swags giveaway and much more. I don't know if it makes sense to share this but I believe it might help some of you near San Francisco to go and meet the industry leaders and network with other data engineering folks. Use my discount coupon code 'PAVAN100OFF' to avail 100% off on the ticket price. (the original ticket price is $199) Get your tickets now! submitted by /u/PavanBelagatti [link] [comments]  ( 9 min )
    Using RAG on CoreML version of Llama2 [P]
    Has anyone ever attempted this or finetuning before on the CoreML version? I’m currently trying to and I’m not even sure where to start tbh. CoreML version of Llama 2: https://huggingface.co/coreml-projects/Llama-2-7b-chat-coreml submitted by /u/Inside-Aromatic [link] [comments]  ( 9 min )
    [D] How does L1 Regularization able to drive a coefficient to zero?
    Hi all, I’m studying the concepts of machine learning. However, I am stuck because I still don’t see how introducing a penalty using lasso regression can drive some parameter coefficients to zero. When doing the calculations, I only get the final value (ordinary least squares + penalty) and don’t directly see a coefficient value being reduced. I've looked at many materials and resources trying to explain this, but I still can't see how it's done. I think the important thing for me is seeing it going to zero or, at the very least, seeing it during calculation. Is there anyone that can help explain this better? Or, If you know of a formula that I can derive that, during the derivation process, shows a coefficient being reduced or set to zero, that would also help. Also, any good resources on the topic would be appreciated. Edit: This post should have been posted in r/learnmachinelearning here is a link to the same post in that subreddit submitted by /u/thismymind [link] [comments]  ( 9 min )
    [D] How do you pre-pay OpenaAI compute credit with university funds ?
    I am an academic and I have some funding. However, I cannot just plug in my lab card with a recurrent payment, procedures don't allow it. Is there a way to "top up" some compute credits on the OpenAI accounts ? Is anyone having the same problem ? Thanks. submitted by /u/Jean-Porte [link] [comments]  ( 9 min )
    [R] Seeking Guidance on Efficiently Classifying and Cleansing Automotive Data with Python
    Hi, we are working on a project that involves dealing with messy automotive data, and are looking for guidance on possible approaches and tools. We aim to map messy supplier data of car makes/models to standardized values from our approved list. This requires handling various challenges like typos, varied specificity, and sometimes research-based mapping (e.g., using engine size and production year to ascertain a chassis code). eg: If a supplier provides 'BNW 316i saloon 1990-1994', (typo intentional) we would like to match it to our standardized value of 'BMW 3 Series (E36)'. Our old approach has been a combination of utilizing fuzzy matching for typos/basic matching and time consuming manual processing and verification. We have recently experimented with using GPT for providing guess…  ( 10 min )
    [R] Lemur: Harmonizing Natural Language and Code for Language Agents
    Today's conversational bots like Claude and GPT can chat impressively but aren't great at complex planning or executing technical tasks. To overcome this, new research from HKU builds open-source AI agents that blend natural language and coding skills. They're called Lemur and Lemur-Chat. The researchers think achieving versatile real-world agents requires models that integrate both fluid natural language abilities and precise programming language control. Humans combine plain speech for higher-level goals with languages like Python when we need to plan intricately and execute exactly. AI needs both capacities too. But most existing models specialize in pure language or pure code. There's a separation that is limiting. The team created Lemur by pretraining the open-source Llama-2 on a massive mixed corpus with 10x more natural language than code. This improved its programming abilities while retaining conversational strength. Further instruction tuning optimized Lemur-Chat for following free-form directions in language. Experiments found Lemur surpassed specialized coding-only models like Codex in overall benchmarks. Lemur-Chat then exceeded Lemur by 15% after instruction tuning. More importantly, Lemur-Chat won 12/13 new "agent tests" designed to mimic real-world challenges needing both language and programming prowess. It beat alternatives at: Using tools like Python and Wikipedia to enhance reasoning Debugging code by leveraging error messages Improving the most from natural language feedback Exploring partially observable environments like cybersecurity and web browsing simulations. Lemur-Chat matched GPT-3.5 in many tests, closing the gap between commercial and open-source agents. TLDR: New open-source AI agents combine coding and language skills. Experiments show the combo unlocks more performance across technical challenges. Full summary is here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [P] Introducing PPO and Rainbow DQN to our super fast evolutionary HPO reinforcement learning framework
    Hi, we've just released a new version of AgileRL, our evolutionary hyperparameter optimisation framework built for RL that is 10x faster than SOTA. We've introduced PPO, Rainbow DQN, some sophisticated replay buffers, and also collaborated with the Farama Foundation to create some tutorials (more on the way). Please check it out and take it for a spin. We're also looking for contributors so get in touch if you would like to be involved! https://github.com/AgileRL/AgileRL submitted by /u/nicku_a [link] [comments]  ( 9 min )
    [P] Free open-source ML observability course: starts October 16 🚀
    Hi everyone, I’m one of the creators of Evidently, an open-source (Apache 2.0) tool for production ML monitoring. We’ve just launched a free open course on ML observability that I wanted to share with the community. The course covers: 📚 Key concepts of ML monitoring and observability (data drift, data and model quality metrics, etc.) 🔡 Monitoring unstructured data (embeddings, texts, LLMs, etc.) 🛠 Different deployment architectures (batch ML monitoring jobs, near real-time ML monitoring, etc.) The course is free and open. All materials are public, with no sign-up required. You’ll work with open-source tools like Evidently, MLflow, Airflow, and Grafana. We’ve already published the first 12 videos with notes and code examples. We’ll add new lessons and deployment blueprints over the following weeks. The official course start date is October 16, 2023. You can also learn at your own pace. Course info and notes: https://learn.evidentlyai.com/ [Background] We’ve been working on Evidently since late 2020 and have spoken to 100s of data scientists, ML engineers, and ML platform teams in different industries. In this course, we tried to sum up answers to the frequent questions on the topic. It starts with high-level theoretical modules and goes to complete deployment blueprints. It is approachable for different levels of knowledge, and you can pick only the modules you are interested in. Looking forward to meeting you at the course! submitted by /u/mllena [link] [comments]  ( 9 min )
    Can I use ArcPro to do machine learning on point (numeric) data? [D] [R]
    I am trying to do machine learning in ArcPro, and I want to understand the relationship between x, y, numeric variable 1, numeric variable 2, and one nominal variable (classified; i.e. can be one of four values). I'd like to be able to predict numeric variable 1 based on everything else. Can ArcPro accommodate machine learning for anything other than raster type data. That is, can it be used to do machine learning on point (numeric) data? Thanks! submitted by /u/arcgis_123 [link] [comments]  ( 9 min )
    [R] TimeGPT : The first Generative Pretrained Transformer for Time-Series Forecasting
    In 2023, Transformers made significant breakthroughs in time-series forecasting For example, earlier this year, Zalando proved that scaling laws apply in time-series as well. Providing you have large datasets ( And yes, 100,000 time series of M4 are not enough - smallest 7B Llama was trained on 1 trillion tokens! ) Nixtla curated a 100B dataset of time-series and trained TimeGPT, the first foundation model on time-series. The results are unlike anything we have seen so far. I published the results in my latest article. I hope the research will be insightful for people who work on time-series projects. Link: https://aihorizonforecast.substack.com/p/timegpt-the-first-foundation-model Note: If you know any other good resources on very large benchmarks for time series models, feel free to add them below. ​ submitted by /u/nkafr [link] [comments]
    [R] Pointers to (deep) latent variable models that admit analytical approximations
    Hi everyone. I am aware that there is a plethora of deep generative models out there (e.g. variational autoencoders (VAE), GANs) that can model high-dimensional data as the images of latent variables under a non-linear mapping (typically neural network). In more traditional methods such as probabilistic PCA, the latent variables can be marginalised analytically. In Bayesian PCA (BPCA), we can additionally integrate out the linear mapping, from the latent space to the observation space, by adopting the variational lower bound that leads to closed form updates of the parameters. The Gaussian Process Latent Variable (GPLVM) model adopts a non-linear probabilistic mapping (a Gaussian process) that can be marginalised. These two models enjoy to a certain degree analytical solutions concerning the inference of the latent variables and the mapping. I have been wondering whether there is any research into more "complex" models (perhaps I should call them deep) that are capable of modelling more complex data distributions than the GPVLM and BPCA, but retain analytical solutions when inferring the posterior of the latent variables (like BPCA) or the mapping (like GPLVM)? What I like about the GPLVM and BPCA is that they possess an objective function (i.e. ELBO) that can be analytically optimised, as opposed to the intractable objective of VAEs that necessitates Monte-Carlo averages and stochastic gradient. Could somebody please point me to such examples of more complex generative models that admit analytical inference for working out the posterior of the latent variables or the mapping? ----- This has also been posted on stack exchange: https://ai.stackexchange.com/q/42418/61537 submitted by /u/ngiann [link] [comments]  ( 9 min )
    [D] I love teaching! But I don't have enough publication for it, what should I do?
    Do I love teaching? Oh, absolutely, YES a big YES! My time as a TA for countless semesters has been amazing. Staying after hours, spending long evenings and early mornings, to make each of my students find ease in debugging both easy-peasy and mind-boggling programs – it’s been a joy, truly. Watching those fresh faces, whom I introduced to Python in their first year ( intro to programming lab), now immerse themselves into my computer vision labs, exploring computer vision and deep learning in their third/forth year – it’s incredibly rewarding! And yeah my students kind of like me! after each semester I get tons of emails thanking me and my TAship review is always good. But, ugh, do I have enough publications to become faculty? A big fat NO! My efforts have been relentless, and everyone in my department would nod in agreement. But luck and reviewers? Not my best pals, apparently. So yeah, I don’t have a stack of 8 top-tier papers. I’ve managed to scrape together 3, and a few second tiers. My citation count is not that bad somewhere between 200 and 300-ish. Now, what’s next for me? Dive into the industry? become a high school teacher? Or perhaps, do a postdoc journey, fingers crossed for a sprinkle more luck and few more papers? Edit: This doesn't mean I don't like research, I actually love it too, I have done quite a few internship in quite big companies, most of the time they extend my intership and I even got publication out of one in 5 month. But I just like to teach a lot! strangely I got social anxiety every where other than my classrooms/labs. submitted by /u/LongjumpingSchool646 [link] [comments]  ( 9 min )
    [D] You don't need a Vector Database you just need a database
    I'm seeing some architectures come out from the LLM world that probably wouldn't survive the trip to production. If you choose a vector database how will you handle your other database needs? Then you'll need 2 databases. https://bionic-gpt.com/blog/you-dont-need-a-vector-database/ submitted by /u/purton_i [link] [comments]  ( 9 min )
    [D] Why back-propagation is intractable of MoCO key encoder?
    In the original paper of MoCo, it said that: Using a queue can make the dictionary large, but it also makes it intractable to update the key encoder by back-propagation (the gradient should propagate to all samples in the queue). First I thought that the main reason that the bp cannot imply on key encoder is that the queue operation is not differentable. But It seems not true. You can compute the gradient of all samples in the queue, then bp should be performed properly. See the code at the bottom. So WHAT IS THE REAL REASON THAT THE BP IS INTRACTABLE FOR KEY ENCODER? In my opinion, I think may be because of the large size of the queue (dictionary) which makes the memory explosive. python q = nn.Linear(768,128) k = nn.Linear(768,128) bs = 64 ks = 4095 model = nn.ModuleList([q,k]) x = torch.randn(bs, 768) optim = torch.optim.SGD(model.parameters(),lr=0.01) loss = nn.CrossEntropyLoss() def forward(x): xq = q(x) xk = k(x + 0.1) que = torch.rand(ks,128) pos = torch.einsum("nc,nc->n",xq,xk) neg = torch.einsum("nc,kc->nk",xq,que) out = torch.cat([pos.unsqueeze(-1),neg],dim=1) t = torch.zeros(out.shape[0],dtype=torch.long) l = loss(out,t) return l loss = forward(x) loss.backward() optim.step() submitted by /u/whishtLF [link] [comments]  ( 9 min )
    [D] Advisor rejects every idea I propose.
    A senior phd student at a moderately famous university. I have a reasonable number of accepted papers as first author in tier-1 conferences. I was thinking of going into academia, so recently I started proposing many ideas to my advisor so that I can mentor some junior students. However my advisor is rejecting every idea I suggest saying it won’t work. I’m feeling very dejected and I feel like I should give up going into academia. I don’t know what I’m expecting from here. Is your advisor like this too? submitted by /u/mildlyphd [link] [comments]  ( 9 min )
  • Open

    Batch calibration: Rethinking calibration for in-context learning and prompt engineering
    Posted by Han Zhou, Student Researcher, and Subhrajit Roy, Senior Research Scientist, Google Research Prompting large language models (LLMs) has become an efficient learning paradigm for adapting LLMs to a new task by conditioning on human-designed instructions. The remarkable in-context learning (ICL) ability of LLMs also leads to efficient few-shot learners that can generalize from few-shot input-label pairs. However, the predictions of LLMs are highly sensitive and even biased to the choice of templates, label spaces (such as yes/no, true/false, correct/incorrect), and demonstration examples, resulting in unexpected performance degradation and barriers for pursuing robust LLM applications. To address this problem, calibration methods have been developed to mitigate the effects of t…  ( 93 min )
  • Open

    Significance of AI in the development of software products
    Artificial Intelligence (AI) is emerging as a formidable force, revolutionizing how we conceive, create, and deliver software solutions. As technology advances at an unprecedented pace, the role of AI in this domain has become increasingly significant. It’s no longer just a buzzword; it’s a fundamental tool that promises to reshape the entire software development process.… Read More »Significance of AI in the development of software products The post Significance of AI in the development of software products appeared first on Data Science Central.  ( 19 min )
    Future of AI and data science – How to secure a bright career
    Companies, more often, pay attention to automation and innovation over proficiency and productivity. However, firms can maintain a balance between both due to the extensive usage of AI and data science programs. Here are the stats that show the impact of AI and data science in diverse sectors: Applications of AI and data science have… Read More »Future of AI and data science – How to secure a bright career The post Future of AI and data science – How to secure a bright career appeared first on Data Science Central.  ( 21 min )
  • Open

    A question
    What are the ways to create plasticity in neural network? Without using weights,bias and activation functions? submitted by /u/Sith_vader3 [link] [comments]  ( 8 min )
    Neural Networks project
    Hi ! My group (4 people) has chosen to make an application that translates ancient stone inscriptions to modern languages as our university project . We can use external libraries to process images that we are going to translate but as we understood we have to build the neural network ourselves from scratch. My questions are 1) is this possible to do within 10 months? 2) if so how would you approach it ? submitted by /u/sakith123 [link] [comments]
  • Open

    From Skylines to Streetscapes: How SHoP Architects Brings Innovative Designs to Life
    At SHoP Architects, a New York City-based architectural firm, Mengyi Fan and her team aim to inspire industry professionals to create visual masterpieces by incorporating emerging technologies. Fan, the director of visualization at SHoP, has expertise that spans the fields of architectural visualization and design. She takes a definitive, novel and enduring approach to designing Read article >  ( 6 min )
  • Open

    Introducing PPO and Rainbow DQN to our super fast evolutionary HPO reinforcement learning framework
    Hi, we've just released a new version of AgileRL, our evolutionary hyperparameter optimisation framework built for RL that is 10x faster than SOTA. We've introduced PPO, Rainbow DQN, some sophisticated replay buffers, and also collaborated with the Farama Foundation to create some tutorials (more on the way). Please check it out and take it for a spin. We're also looking for contributors so get in touch if you would like to be involved! https://github.com/AgileRL/AgileRL submitted by /u/nicku_a [link] [comments]
    Masking state transitions in policy updates for invalid actions?
    I am currently dealing with an environment, that most of the time (90% of all state transitions) clips the action selected from the agent. Sometimes even down to the point where the action selected by the agent is completly ignored. This causes a lot of problems, because for example the entropy bonus does not works, since the agent learns to select any action, when it doesn't matter anyway but selects the same action (low entropy) when the actions have an effect. Using the PPO algorithm I was thinking of masking the state transitions in the policy updates, according to how much the action was clipped in the environment. And I thought V(s) should be masked, because it can still learn from the state transitions even if the action was effectively ignored by the environment. submitted by /u/flxh13 [link] [comments]
    A question about deterministic action selection at evaluation time
    I'm training some agents using fairly vanilla PPO on a hand-made environment. These agents learn to perform the task pretty well, but while I was examining their action probabilities during an evaluation episode, I had the idea to turn off deterministic action selection. To my surprise, allowing probabilistic action selection (as opposed to argmax action selection) actually improved performance in some cases. I had always thought that deterministic actions during evaluation was fairly standard, but now am thinking that maybe I missed something and that there are cases where you wouldn't want determinism? My question is: how common is it actually to use deterministic actions vs. probabilistic ones at evaluation time, and does anyone know of studies/papers/examples where the authors found probabilistic evaluation to outperform determinism? submitted by /u/Impallion [link] [comments]
    "A Simple Open-Loop Baseline for Reinforcement Learning Locomotion Tasks" Raffin et al. 2023
    submitted by /u/atooo57 [link] [comments]
    Looking for some advice regarding universal multi-head outputs
    Hey, So I am working on reinforcement learning package in C# (currently under heavy development): https://github.com/asieradzk/RL_Matrix My goal is to create something superior to unity's ML Agents for Godot to democratize access to reinforcement learning for people (without having them know what a tensor is) So far I've added some barebones DQN and PPO that (only output single discrete action) as proof of concept to test my code architecture. So I am going through the daunting task of having some universal workflow for setting up environments. For any shape observations and any count actions, both discrete and continuous. As I am finishing my multi-head multi-action output I've come to realise that there are many possible architectures I could setup multi head outputs, for instanc…
    Next state in turn based game
    To my knowledge, when using the Q Learning family algorithm, we must know the next state as well as the action spaces in couple with that observation in order to evaluate the reward for the next state with the target network. But I have some problem when trying to define this next state in turn turn-based game in which the agent have to make a certain number of actions and then wait for the opponent to do some actions before it can interact with the environment again. We can take Hearthstone as an example that each player have to wait for other to play a number of cards before can take any action. Currently, I have two options for this: - Treat the environment right after the agent's turn ended, which will lack the action space. - Treat the environment just before the agent's turn begins, which will have all the actions available that it can choose from but this will make the agent's last action very noisy. That state could be a good state if the opponent playing badly or they are very good and make our last decision seem like a very bad choice. Thanks in advance for any suggestions. If my problem is a common task that others have already solved many times, I will be very thankful for that keyword. submitted by /u/No-Concentrate-6037 [link] [comments]
    "Small batch deep reinforcement learning", Obando-Ceron et al 2023 {DM} (value-based agents explore & regularize better with small n)
    submitted by /u/gwern [link] [comments]

  • Open

    How are memories stored in neural networks? | The Hopfield Network #SoME2
    submitted by /u/keghn [link] [comments]
    A question
    How does the neural network process input that were same but shown different to the network model? submitted by /u/Sith_vader3 [link] [comments]
    I don't much about NN's. is this correct ?
    i gave chatgpt vision an illustration of neural network from The Principles of Deep Learning Theory. what to know how correct its reponse is here is the response: https://preview.redd.it/inqe5xukxptb1.png?width=453&format=png&auto=webp&s=6e1079baeae8235b0e03a677e4006d1077af36a8 submitted by /u/YeshwanthRam [link] [comments]
  • Open

    Who Will Benefit from AI?
    Artificial intelligence (AI) can provide "machine usefulness" for human workers, augmenting their jobs rather than replacing them. However, there is a concern that AI could lead to job displacement and reinforce economic inequality. MIT economist Daron Acemoglu emphasizes the importance of making AI more useful to humans and ensuring that the economic benefits are shared widely. He suggests that innovations that augment workers' tasks can lead to prosperity for the workforce. Acemoglu also highlights the need for worker power and the careful implementation of technology to achieve shared prosperity and productivity gains. Source : https://idss.mit.edu/news/who-will-benefit-from-ai/ submitted by /u/NuseAI [link] [comments]
    What's the most advanced free chatbot available?
    I just need three things for it: It must be knowledgeable about things, such as physics, math, hystory, books, geography, etc. It also must be original, with a high level of SEO and AI detection score. It must be available in Italy. The last part is essential. Claude 2 is very famous but with sms verification from usa (which I don't have and I don't want to give credit card info/pay to have) it's made almost impossible even with vpn. submitted by /u/luigirovatti1 [link] [comments]
    10 Powerful ChatGPT Hacks for SEO
    submitted by /u/Senior_tasteey [link] [comments]
    ChatGPT's Global Peace Plan
    Creating true, enduring, lasting peace on Earth is an ambitious and complex endeavor that requires multifaceted approaches. Here’s a bold, outside-the-box plan that may surprise you: Step 1: Establish a Global Consciousness: Educational Overhaul: Revamp global educational systems to foster empathy, understanding, and appreciation for diverse cultures, religions, and viewpoints from a young age. Step 2: Eradicate Poverty and Inequality: Universal Basic Assets (UBA): Implement a Universal Basic Assets program, where every person on Earth is granted a share of global resources. Step 3: Create a Single Global Governance Entity: World Federation: Establish a democratically elected World Federation that respects regional autonomy but has overriding authority on global issues like…
    When your AI says she loves you
    submitted by /u/thisisinsider [link] [comments]
    Anyone ever thought about training a video generating model, but backwards?
    Just had a random idea: What if you train a video generating AI, but feed it videos that are reversed? You could show it an image of a crashed car, and it would generate a video of the crash. Show it a broken vase, it would "repair" it. It could one day become like the "reconstruct crime scene" in Detroit: Become Human. What are your thoughts about this? submitted by /u/FluffyIllustrator805 [link] [comments]
    AI and science: what 1,600 researchers think
    A Nature survey of over 1,600 researchers reveals that AI tools are becoming increasingly common in science and are expected to be 'very important' or 'essential' in the next decade. Scientists express concerns about how AI is transforming research, including reliance on pattern recognition without understanding, bias in data, fraud, and irreproducible research. The survey shows that AI tools provide faster ways to process data, speed up computations, and save time and money. Among researchers who use AI, more than one-quarter believe AI tools will become 'essential' to their field in the next decade. Large language models like ChatGPT are mentioned as both impressive and concerning examples of AI tools in science. Source : https://www.nature.com/articles/d41586-023-02980-0 submitted by /u/NuseAI [link] [comments]
    Looking for AI text input like Artbreeder Mixer that combines images
    I'm looking for a (free) ai image generator like Artbreeder Mixer, that has functions that allow you to "morph" or mix images together via text prompts. Ive looked at a bunch already, and even tried adding the text of the different types in the prompts, bu Ive been getting separated results (like "cat" , "man", "head" wont combine the man and the cat, but rather give me un-morphed results, like a regular man, plus a cat in a suit with no human features. I even get a result with a man standing behind a cat! Ive tried StarryAI, imagecreator, wepik, cant afford midjourney or paid ones right now, some others I cant remember with no mixing... Artbreeder's interface, you can keep adding and it will mix them together. I made these images and others like them very easy in Artbreeder, but its plan is very limited - I could buy more credits, but I need to wait a few days (new job, not paid yet, broke today... lol): ​ morph between man and donkey Morph between angry rapper and gorilla SO, if anyone can suggest some free, or almost free (generous newbie credits?) that can do mixes like this - please point me in the right direction. submitted by /u/magusat999 [link] [comments]
    New York wants to be AI's world capital, in rivalry with San Francisco and Silicon Valley
    submitted by /u/norcalnatv [link] [comments]
    Could an AI-created profile picture help you get a job?
    Artificial intelligence (AI) is being used to create professional-looking profile pictures for job hunting websites like LinkedIn. Apps like Remini, Try It On AI, and AI Suit Up use AI-based software to generate slick profile photos that mimic the work of expert photographers. Users upload multiple selfies, and the AI software creates artificial photos with different hairstyles, clothing, and backdrops. While some find the results realistic, others think they look artificial. The AI services are popular because they are cheap or free, making them accessible to those who can't afford professional headshots. However, opinions are divided on whether AI-generated photos are beneficial or detrimental to self-esteem. Some believe that AI-generated photos allow individuals to put their best self forward and potentially increase their chances of being considered for opportunities. Others worry that relying on AI-generated photos may negatively impact self-worth and confidence. Recruiters generally do not consider whether a photo is AI-generated when evaluating job applications. Source : https://www.bbc.co.uk/news/business-67054382 submitted by /u/NuseAI [link] [comments]
    AI Tool for film footage notes
    Hi, im currently filming a documentary, but I’m so busy filming, i don’t have time to write notes on footage for the editor. Does anyone know of any ai tool that can help with this and save time and streamline this process? King regards submitted by /u/Brand0n_C [link] [comments]
    How AI will affect traditional and open source software industry?
    Hey folks, how would you guys see the effect of AI? Will the small softwares companies will go bankrupt? Since the lots of software are using tools like ChatGpt, Midway Journey etc. It just the starting of new AI technology era which will evolved over the years. In that time we will see more and more AI software which will likely provide efficient and better solution as compare to traditional and open source software. So my question is how do you guys see this? Will small software companies or open source software programs days are number? submitted by /u/Haziq12345 [link] [comments]
    One-Minute Daily AI News 10/11/2023
    Opera has launched Opera One — a new version of the browser that comes packaged with an AI-powered chatbot called Aria.[1] Adobe is going all in on AI, announcing three new generative AI models today that add powerful features to Illustrator and Adobe Express and vastly improve Photoshop’s text-to-image capabilities.[2] ‘South Park’ to Tackle AI for Next Event Special, Releases Teaser.[3] World’s first AI tutor launched in Australia to help students get through their exams.[4] Sources: [1] https://www.theverge.com/2023/6/21/23768888/opera-one-browser-aria-ai-assistant-chatbot [2] https://www.theverge.com/2023/10/10/23911114/adobe-max-firefly-generative-ai-model-photoshop-illustrator-express [3] https://www.hollywoodreporter.com/tv/tv-news/south-park-ai-joining-panderverse-1235615276/ [4] https://www.techguide.com.au/news/computers-news/worlds-first-ai-tutor-launched-in-australia-to-help-students-get-through-their-exams/ submitted by /u/Excellent-Target-847 [link] [comments]
    Cypher 2023: The Future of Simulation and Design is AI
    submitted by /u/Agitated-Spell3979 [link] [comments]
    Any ideas how this was created?
    submitted by /u/crispyTacoTrain [link] [comments]
    Web design tools
    I’m looking for input and advice on tools for web designers. I use Wordpress a lot, Magento some and frequently code by hand in html JavaScript and PHP. I know there are some AI tools out there now but I don’t know which are best and wanted to find out what people thoughts are on this subject. What tools are you using, for what, and why? Thanks! submitted by /u/PowerTarget [link] [comments]
  • Open

    [R] Researchers Identify Emergent Linear Structures in How LLMs Represent Truth
    LLMs' tendency to make up false statements (hallucinate) is a major concern. We need ways to inspect whether they really "know" something is true or not so we can reduce hallucinations. In a new paper, researchers found that LLMs contain an internal "truth vector" - an emergent linear structure that represents factual truth values. They had the insight to visualize how GPT represents simple true/false sentences. The true ones clustered together, while false ones clustered elsewhere - suggesting some kind of 'truth direction' in its learned representations. To test this, they trained linear "probes" on one dataset, and found they could generalize to accurately detect truth values in totally different datasets about other topics. They also directly modified the models to add or subtract the identified truth vectors from its processing of statements. This could flip assessments of truth value, showing the vector causally influences reasoning. Together, these findings provide evidence that neural networks can create emergent, linear structures that represent factual truth. This finding could eventually help make AI systems less prone to hallucinations and falsehoods. TLDR: LLMs can create emergent linear representations of truth. This sheds light on how AI represents abstract concepts and could help us reduce hallucinations. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Recommendations request for a guide to research publication
    I am working on a research topic in Data Engineering. Forgive me if this is a question frequently asked, I couldn't find this specifically in the FAQ. What are good publication tips and journals to publish in? I read through a few journals and all of them are big publications. What if I opt fot some upcoming or other niche (maybe data engineering) journals submitted by /u/Sherbhy [link] [comments]  ( 9 min )
    [R] SWE-bench: Can Language Models Resolve Real-world GitHub issues?
    We have a new benchmark out called SWE-bench (arxiv) It challenges LMs to solve real GitHub issues (feature requests & bug reports) from popular Python repos. Answers are validated using unit tests we crawled from those repos. The benchmark at swebench.com/ shows that even the strongest models, such as Claude 2 and GPT-4, get less than 5% accuracy. ​ We are here to answer any questions you may have. submitted by /u/ofirpress [link] [comments]  ( 9 min )
    [D] Sample probability diffusion models
    I would like to understand how I can calculate the probability that a sample belongs to the distribution a diffusion model was trained on. Say, I have an image of a car, and I would like to know whether this image belongs to the distribution that is estimated by the diffusion model. So I would like to know the probability between zero and one at the car belongs to this distribution Do you know how I technically can do this? submitted by /u/That_Phone6702 [link] [comments]  ( 9 min )
    [Discussion] Making a Tutorial for Using a New Platform for ML in the climate and earth science space
    Hey guys Looking for some ideas. I'm building out a jupyter book that will be a tutorial on how to use a research platform for data analysis and modelling. My PI has given me free liberty over it. I can not think of a good idea to do the analysis and build the model on. It does not need to be complex but should be good enough so that any researcher, student or organization using the platform can get a good idea of how to use it for ML. Any thoughts on a good area to look into? Any recommendations? Note this will be a tutorial and as such an overly complex model is unnecessary. I just can not figure out what to look into so hoping you guys could give thoughts about possible areas in climate, weather and earth science that I could focus on for the tutorial in the jupyter book. submitted by /u/AdditionalFun3 [link] [comments]  ( 9 min )
    [D] Submitting a paper rejected by EMNLP to ARR
    First time submitting to ARR here. I was quite confused about this paper resubmission thing. I got rejected by EMNLP (submission directly to EMNLP with openreview) a week ago and I am planning to resubmit it to the ARR system (also using openreview). Does this EMNLP submission count as a previous ARR submission that should be mentioned or not? Do I need to withdraw the paper from EMNLP openreview prior to submitting it to ARR openreview? submitted by /u/Icy-Distribution6887 [link] [comments]  ( 9 min )
    [D] [P] UI-based AI agents: UI-Act
    Hi! Happy to share a project I've been working on for a while: UI-Act https://github.com/TobiasNorlund/UI-Act It's an AI model architecture designed to autonomously navigate and interact with computers using the graphical user interface. Think of it as a co-pilot that "sees" your screen and acts on it, just as a human would. In essence, it's a custom transformer model taking prompt and screenshots as input, with output heads to predict low-level actions i.e. mouse clicks. In the demo, it has been trained to compute simple expressions in a calculator window, using expert demonstrations/behavior cloning. If scaled up appropriately however, it could provide a basis for a general agent to automate arbitrary tasks on a computer. I would be interested in hearing your thoughts on it, and especially with regards to the trend towards general AI agents and assistants (Windows Copilot / Adept ACT-1 / AutoGPT etc). LMs equipped with e.g. function-calling is a trendy approach, that rely on text-based state representations and APIs to take action. In cases where this is unfeasible, UI-based agents might provide a more general alternative. As the agent's interface to the computer is shared with humans, it can be easily taught using expert demonstrations, and require little or no technical expertice. Let me know what you think! submitted by /u/tobibbelfuel [link] [comments]  ( 9 min )
    [P] Learn how to make trustworthy and transparent machine learning models in Tsetlin Machine Book Chapter 7: Confidence, Trustworthiness, and Composites.
    ​ Confidence and trustworthiness of Tsetlin Machines. Hi all! Just completed a new chapter in the book An Introduction to Tsetlin Machines: https://tsetlinmachine.org Happy to receive feedback! Abstract: Collaboration can be essential to manage complex projects. One example is building a house. You then need the expertise of carpenters, plumbers, and electricians. Each profession brings unique skills to the table. Similarly, different types of Tsetlin machines can have distinct capabilities. In this chapter, you learn how Tsetlin machines can team up, allowing them to achieve more than they could on their own. The effectiveness of a team relies on recognising each member's strengths and limitations. Appreciating where your expertise stops and where your coworkers' expertise begins is crucial for effective collaboration. We first explore how Tsetlin machines can assess their competence in Section 7.1. Using the vote count from Chapter 1, you learn to measure how confident a Tsetlin machine is when it makes its decisions. It is possible to be highly confident and still perform poorly. To be trustworthy, confidence must be in line with one's capabilities. Therefore, Section 7.1 also covers how to evaluate trustworthiness. Next, in Section 7.2, you discover how to build a team of Tsetlin machines with different skills. By assessing each Tsetlin machine's confidence, you can lean on the confident ones when making decisions. The result is a Tsetlin machine composite - a construction where multiple Tsetlin machines join forces. You can think of it as a composite material, such as epoxy, which reinforces resin with fibres, making it strong, lightweight, and durable. submitted by /u/olegranmo [link] [comments]  ( 9 min )
    [R] [D] Need Peer Review: Unsupervised Learning for Student Dropout Anomaly Detection
    Hello all, Just wrapped up Task 1.1 for anomaly detection in student dropout rates. Keen for some extra eyes on it. Task Highlights: Data Pre-processing & Normalisation K-Means Clustering Gaussian Anomaly Detector Used PCA for dimensionality reduction Links to the following files: data.csv Task 1.1 - Rubric.pdf Task1.1Script.ipynb https://drive.google.com/drive/folders/17XcjEoYCrDWqf90VVNdkLAkYNdtWWwGu?usp=sharing Would greatly appreciate any feedback! Cheers! submitted by /u/Nook31 [link] [comments]
    [R] A method to assess trustworthiness of machine coding at scale
    submitted by /u/mnky9800n [link] [comments]  ( 8 min )
    [P] [vilays] Prototype Video Demo - Any Feedback from ML Engineers?
    Hi everyone, I’m thrilled to share a prototype we've been tirelessly working on. We are developing a virtualization environment for applications, specifically tailored to engineers, designers, data scientists, and researchers. In a nutshell, our platform enables users to run cloud-hosted desktop apps from any device, making it appear as if the applications are installed on their local machines, while they're actually operating on a remote server. The ultimate goal is to obliterate barriers between local and cloud execution, especially for compute-intensive workloads, thereby allowing seamless usage of High-Performance Computing software on the cloud with the scalability to adjust computing resources as per necessity. We’re here to solicit your invaluable feedback on our product video demo. Your insights will not only help us identify any blind spots and enhance our solution but also better understand the needs and preferences of our potential user base. 📽 [https://youtu.be/QR8FWRnPrXM?feature=shared] We're eagerly awaiting your thoughts and appreciate you taking the time to help us refine our product! Thank you! :) submitted by /u/aaron-cesaro [link] [comments]
    [D] Databricks Dolly 15k - Creating Synthetic Variants
    Hey all, I found Dolly to be a very interesting project when it was released but I'm curious if it has similar value today because a lot of synthetic data generation options seem to be popping up. Now it seems like Dolly is human generated/curated by over 5k employees (which is great), but wouldn't it be a better approach now to have Llama70b (or maybe Falcon) just generate future variants of 15k rows? I havent been able to figure out why we arent seeing more synthetic datasets like this on HF? Is the bottleneck licensing, compute or just incentive? Heres the original Dolly post thread: https://www.reddit.com/r/MachineLearning/comments/120usfk/r_hello_dolly_democratizing_the_magic_of_chatgpt/ submitted by /u/buzzyness [link] [comments]
    [D] Please suggest a Loss function for image to image task.
    What is the loss function that needs to be used for a task that takes an input image with a lot of haze and produces an image with reduced haze. The architecture is a simple encoder decoder architecture. I tried MSE as some articles and ML guides say that MSE is good for pixel wise comparison and also tried Categorical Crossentropy but none of them work so great. MSE works but produces artefacts like red/green/ blue spots and spatters and at worse times it produces a white image. The research on this task includes use of SIDNet[Single Image Dehazing Net], Transmission maps, Dark channel prior algorithm, FFA net, etc trained on the Benchmark datasets (RESIDE,SOTS). I aim to create a simple architecture for college project so I chose the Enc-Dec architecture. Any suggestions are appreciated. submitted by /u/Wild_Basil_2396 [link] [comments]
    [D] Startup team demonstrates differentiable Swift compiler outrunning TensorFlow by 322X
    Autonomous systems startup, PassiveLogic, assembled a differentiable computing team, to build a fast systems language with native performance differentiability. Their latest benchmark trains networks two orders of magnitude faster than PyTorch and Tensorflow. See: LinkedIn Post&dashCommentUrn=urn%3Ali%3Afsd_comment%3A(7118052434916110337%2Curn%3Ali%3Aactivity%3A7117911978106355712)) It's a collaborative effort with the Swift community and Apple's compiler team, using the Swift language as a strongly typed embedded language that performs ahead of time compilation of graph neural nets. The focus is on fusing systems programming and AI engineering into a single native high performance language, to enable typed heterogeneous inference and training. The compiler development is open sourced as part of the standard Swift package. Try it yourself at swift.org. submitted by /u/taharvey [link] [comments]  ( 9 min )
    [D] How is test-driven development implemented in the context of machine learning?
    I recently tried to refactor a previous project that I had, but I realized that after making all of the changes the performance wasn't reproducible anymore. I decided to start from scratch, make incremental changes, and make sure that the model's performance is maintained with each change. Very basic in hindsight, but I guess I was too hasty with coding. Anyway, running the full model's training and evaluation with each change is proving to take too long. I'm curious if there's any other way that people implement TDD in the context of machine learning since projects/applications tend to be more time consuming then typical applications. submitted by /u/Seankala [link] [comments]
  • Open

    Developing industrial use cases for physical simulation on future error-corrected quantum computers
    Posted by Nicholas Rubin, Senior Research Scientist, and Ryan Babbush, Head of Quantum Algorithms, Quantum AI Team If you’ve paid attention to the quantum computing space, you’ve heard the claim that in the future, quantum computers will solve certain problems exponentially more efficiently than classical computers can. They have the potential to transform many industries, from pharmaceuticals to energy. For the most part, these claims have rested on arguments about the asymptotic scaling of algorithms as the problem size approaches infinity, but this tells us very little about the practical performance of quantum computers for finite-sized problems. We want to be more concrete: Exactly which problems are quantum computers more suited to tackle than their classical counterparts, an…  ( 94 min )
  • Open

    UK Tech Festival Showcases Startups Using AI for Creative Industries
    At one of the U.K.’s largest technology festivals, top enterprises and startups are this week highlighting their latest innovations, hosting workshops and celebrating the growing tech ecosystem based in the country’s southwest. The Bristol Technology Festival today showcased the work of nine startups that recently participated in a challenge hosted by Digital Catapult — the Read article >  ( 6 min )
    Get in Gear: ‘Forza Motorsport’ Races Onto GeForce NOW
    Put the pedal to the metal this GFN Thursday as Forza Motorsport leads 23 new games in the cloud. Plus, Acer’s Predator Connect 6E is the newest addition to the GeForce NOW Recommended program, with easy cloud gaming quality-of-service (QoS) settings built in to give Ultimate members the best streaming experience. No Breaks, No Limits, Read article >  ( 6 min )
  • Open

    DeepMind 2022 'full accounts' financial report: 2022 budget: £1,081 million ($1.3b) (decreased by a fifth from 2021)
    submitted by /u/gwern [link] [comments]
    RL for non-Python environments?
    Most real world applications for RL (robotics, game dev, finance) are in not normally done in Python, yet all major RL frameworks are written in Python. Is there a good/high-performance cross-language framework to do RL in other languages like C++/.Net/Java? If not, do you think people would be interested in such a framework? ​ submitted by /u/xor24 [link] [comments]
    Reinforcement learning agents that adhere to a causal model of the problem
    Do you know any work that tries to develop RL agents that exploit some sort of high-level model of the problem (it could also be given by an expert human) to learn faster or operate on out-of-distribution scenarios? I'm particularly interested in Causal Models, but any similar thing could be interesting for me submitted by /u/fedetask [link] [comments]
    What is the intuitive explanation for using log probabilities in Policy gradient methods instead of simple probabilities? does it improve gradient descent optimization ?
    submitted by /u/aabra__ka__daabra [link] [comments]
    Why does Drq-v2 sample from replay by episode then experience?
    I've been looking at DrQ-v2 (https://github.com/facebookresearch/drqv2) recently and it samples from replay in a way that seems odd to me but may have a purpose I don't understand. They store experiences in a compressed file by episode, this makes some sense since it means they don't have to store everything in RAM and they delay disk writes until the end of the episode so they don't slow down the sim operation. On sampling, they randomly select an episode then randomly select an experience from the episode, calculating the n-step reward dynamically at sample time instead of at experience storage time. This is then fed to the model by a pytorch DataLoader. This means a _lot_ of disk reads during the optimization step which can't be ideal but I'll put that aside. What is the advantage of doing this selection by episode? It may give a better spread across episodes in each update, but I'm not sure that makes up for the potential downsides of making prioritization and other replay tricks much harder. Any ideas? submitted by /u/EDMismyO2 [link] [comments]
    Can reinforcement learning models learn to rank?
    I have a very simple observation: a list of random value state = [random.uniform(-0.2, 0.2) for _ in range(200)] reward = state * actions . The reward is not using the next state, it's using the previous state i gave to the model. So basically i already give the answer to the model, the best action is : if state > 0 action =1, if state < 0 action = -1 I tried using PPO, but it seem not learning at all. My test_env.py is here: ``` import gymnasium as gym import numpy as np from gymnasium import spaces from gymnasium.utils import seeding from stable_baselines3.common.vec_env import DummyVecEnv import random class TestEnv(gym.Env): metadata = {"render.modes": ["human"]} def __init__( self, item_count, test_steps, is_train = True, ): self.is_train = is_train self.test_steps = test_step…
  • Open

    Microsoft at VL/HCC 2023: Focus on co-audit tools for spreadsheets
    These research papers were presented at the IEEE Symposium on Visual Languages and Human-Centric Computing (opens in new tab) (VL/HCC 2023), a premier forum for design, theory, and application of computing technologies for programming, modelling, and communication. Large language models (LLMs) have revolutionized the way novice programmers and everyday computer users tap into the capabilities […] The post Microsoft at VL/HCC 2023: Focus on co-audit tools for spreadsheets appeared first on Microsoft Research.  ( 10 min )
  • Open

    Homework problems are rigged
    This post is a follow-on to a discussion that started on Twitter yesterday. This tweet must have resonated with a lot of people because it’s had over 230,000 views so far. You almost have to study advanced math to solve basic math problems. Sometimes a high school student can solve a real world problem that […] Homework problems are rigged first appeared on John D. Cook.  ( 7 min )
  • Open

    12 Generative AI Trends to Watch Out for
    The advent of generative AI is empowering everyone alike – organizations, small businesses, individuals, students, and medical professionals, to name a few. The last couple of years have been revolutionary for artificial intelligence innovation and transformation. How will 2024 shape up for AI, AI tools, and related professionals? Let’s analyze the trends that are most… Read More »12 Generative AI Trends to Watch Out for The post 12 Generative AI Trends to Watch Out for appeared first on Data Science Central.  ( 20 min )

  • Open

    Predictive AI analyzing attraction to facial features (iris Dating app)
    Top dating apps Tinder, Hinge and Bumble have all stated that they're already investing in AI to make their apps better. They're using it to verify profiles, match people based on bios and interests, and help generate profile descriptions and liven conversations. But what about machine learning on user photos? iris Dating uses AI to analyze user input in the form of liking or disliking faces ("swiping" profiles). We all know if we like blondes or brunettes, blue or brown eyes, short or long hair, beard or no beard, etc. But AI can pick up the subtlest features (proportions, distances, curvatures etc.) and build a face map. A matrix of features, if you will. It doesn't just look for a person looking like your favorite celebrity crush. It understands what you're really attracted to. From there it's an easy path: if it knows which features attract me, it can predict my level of attraction to a specific individual (specifically, their face). Find the persons with the highest predicted attractiveness (for me, not for everyone), rank them by attraction for me, and we have a potential high mutual attraction match. The two stats I have are that on average women like 55%(!) of the profiles iris picks for them; and that users have 40x higher chances of matching when they've trained the model to understand their taste. I know it takes a lot more than a pretty face to make for a great relationship, but it sure doesn't hurt to start with strong physical attraction. Missed connections on Craigslist are about just that: seeing a face you can't forget. Find me more of these "wow" faces and let's go from there. What do you think? Is it too early? Too bold? Too niche? submitted by /u/akahamlet [link] [comments]
    Superman if portrayed by different actors (as imagined by AI)
    submitted by /u/fat_n_stupid [link] [comments]
    DALL·E 3 is blocking copyrighted material. Also DALL·E 3:
    submitted by /u/Zimmax [link] [comments]
    The AI research job market shit show
    The AI research job market is going through a shakeup, with a high demand for skilled researchers and a scarcity of talent. Companies closely monitor the movements of researchers as an indicator of their ability to transition from concept to product. The market is highly competitive, with researchers being offered high salaries and compensation packages. This has led to high turnover and attrition in many companies, causing unsettledness among employees. Despite the challenges, the investment in AI research is expected to drive innovation and push the boundaries of the Transformer architecture. Source : https://www.interconnects.ai/p/ai-research-job-market submitted by /u/NuseAI [link] [comments]
    Are there any low res (pixel art) art tools?
    I'm looking for ways to create art for a game I'm creating. submitted by /u/Yenii_3025 [link] [comments]
    Inverting Transformers Significantly Improves Time Series Forecasting
    Transformers are great at NLP and computer vision tasks, but I was surprised to learn they still lag behind simple linear models at time series forecasting. The issue is how most Transformer architectures treat each timestamp as a token and fuse all the variable data from that moment. This makes two big problems: Variables recorded at slightly different times get blurred together, losing important timing info Each token can only see a single moment, no long-term dependencies So Transformers struggle to extract useful patterns and correlations from the data. Some researchers from Tsinghua University took a fresh look at this and realized the Transformer components themselves are solid, they just need to flip the architecture for time series data. Their "Inverted Transformer" (or iTransformer): Makes each variable's full history into a token, instead of each timestamp Uses self-attention over variables to capture relationships Processes time dependencies per variable with feedforward layers This simple tweak gives all the benefits we want: State-of-the-art forecasting accuracy, beating both linear models and standard Transformers Better generalization to unseen variables Increased interpretability Ability to leverage longer historical context TLDR: Inverting Transformers to align with time series structure allows them to outperform alternatives in working with time series data. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]
    Best ChatGPT Plugins: Ultimate List for 2023
    submitted by /u/Senior_tasteey [link] [comments]
    The NSFW dream (truely unrestricted ai desires)
    I guess I'm looking for the impossible but does anyone know of a generator that has all of the following qualities in order of importance least to most important: Has a massive variety of styles like Womba's private discord server does. "Create variants" function like how a Womba discord personal server generator allows you to do. Generates beautiful "digital art" style images like the digital https://www.unstability.ai/ does. (Man those images are pretty) faces are really good most of the time. (It's frusterating as it looks so good but I can't seem to get any group sex poses going on.) Provides a variety of poses such as https://easywithai.com/ai-image-generators/promptchan-ai/ which also allows you to upload you own images for poses, like how I could upload a real life orgy image and as long as it could distinguish the bodies as being separate (not a big pile of limbs) it does pretty good, but lacks severely lacks in facial quality. Like a big booty girl in hyperreal style 1080P or higher resolution. (Again Womba is good here, but they are just extreme on their restrictions.) 1080P should be the minimum for any paid service as how can we truely enjoy a full screne image on anything less without it pixeling out? Doesn't cost $150/month (yes I found one that does all this but their premium subscription cost like $150/month (seduced.ai) and it's not even unlimited. I paid $90 for a full year at Womba discord unlimited but again, $150/month is just not worth it. If anyone knows of a server that has all these for around $25/month or less, please let me know. If really appreciate it. submitted by /u/russader [link] [comments]
    Can AI reference both photos to make the black and white photo the same as the colour image?
    I have a high resolution black and white print and a generic quality colour image of the same photo, that I'd like AI to look at both images and make the B&W into colour. Is this possible? submitted by /u/NikonD3X1985 [link] [comments]
    AI Morality Scenarios.
    submitted by /u/Philipp [link] [comments]
    One-Minute Daily AI News 10/10/2023
    Cybersecurity firm Avast is calling out a long-lived tool “LoveGPT,” that has haunted popular dating apps and that has been upgraded with artificial intelligence, gaining the ability to build fake profiles and manipulate unsuspecting users.[1] The outsider told the WSJ that Microsoft used AI from its partner OpenAI, which was then used to launch GitHub Copilot at $10 per month, but lost $20 per user in the average six months on average in the first 2023. Some Copilot users cost as much as $80 per month.[2] SK Telecom said on Monday that it successfully wrapped up its international AI competition of 226 teams, “Prompter Day Seoul 2023,” held in partnership with OpenAI.[3] Google DeepMind Researchers Introduce Promptbreeder: A Self-Referential and Self-Improving AI System that can Automatically Evolve Effective Domain-Specific Prompts in a Given Domain.[4] Sources: [1] https://decrypt.co/200787/lovegpt-ai-dating-apps-catfishing-hack-avast [2] https://game-news24.com/2023/10/10/microsoft-lost-20-for-every-10-copilot-ai-subscription-report-45-for-every-10-copilot-ai/ [3] https://asianews.network/skt-openai-hold-ai-competition-for-social-good/ [4] https://www.marktechpost.com/2023/10/08/google-deepmind-researchers-introduce-promptbreeder-a-self-referential-and-self-improving-ai-system-that-can-automatically-evolve-effective-domain-specific-prompts-in-a-given-domain/ submitted by /u/Excellent-Target-847 [link] [comments]
    I finally have enough ai tools and here is my complete list
    VIDEO EDITING InVideo CapCut Filmora Veed io Rotor KEYWORD RESEARCH VidiQ Summarized YT Summary CONTENT CREATION Explore Al Vidds Opus Descript Lumen5 Steve Al AUDIENCE ENGAGEMENT ManyChat TubeBuddy Canva Hootsuite ANALYTICS Vidyo Nova Al Daily Life Tools Taskade TLVD Bardeen Al Vondy Al Notion Al Chatbots Tools YatterPlus Typewise Quickchat Cohere Kaizan Coding Tools Durable Al 10Web Akkia Replit Deepcode Design Tools Flair Al Autodraw StockIMG Booth Al Clipdrop Content Creation Tools Writesonic Beautiful Al Tome Al ChatABC Steve Al Music Tools Boomy Amper Jukedeck Melodrive BrainFM Writing Tools AISEO Quillbot Writesonic Bertha Al Simplified Youtube Tools Eightify Thumbly Steve Al ClipMaker TubeBuddy Twitter Tools Tweetmonk Tribescaler Postwise Tweetlify Tweethunter Sales Tools Lavender Warmer Regie Twain Octane Marketing Tools simplified ContentEdge Copt Smith Copy Al Mutiny Research Tools Consensus Paperpal Trinka Writesonic scholarcy I'm just sharing my experiences and observations in the field of ai. LIST AND SITE submitted by /u/PerceptionPlayful469 [link] [comments]
    Write Your Next Book with These Awesome ChatGPT Prompts
    Awesome ChatGPT Prompts submitted by /u/Senior_tasteey [link] [comments]
  • Open

    [D] how to download datasets from huggingface
    Hello, first time using Google Colab and huggingface datasets. Colab notebook is easy to setup but I can't seem to figure out how to download datasets from huggingface. I am trying to download https://huggingface.co/datasets/kili-technology/plastic_in_river dataset in Colab Notebook. After reading some beginners forums, I modified the example to look like one below but it failed. from datasets import load_dataset data_files = {"train": "train.csv", "test": "test.csv", "validation": "validation.csv"} dataset = load_dataset("kili-technology/plastic_in_river", data_files=data_files) Because there's no path to the files to be downloaded. Can someone explain how to download datasets from huggingface please? Downloading builder script: 100% 3.25k/3.25k [00:00 in () 2 3 data_files = {"train": "train.csv", "test": "test.csv", "validation": "validation.csv"} ----> 4 dataset = load_dataset("kili-technology/plastic_in_river", data_files=data_files) 5 frames /usr/local/lib/python3.10/dist-packages/datasets/data_files.py in resolve_pattern(pattern, base_path, allowed_extensions, download_config) 366 if allowed_extensions is not None: 367 error_msg += f" with any supported extension {list(allowed_extensions)}" --> 368 raise FileNotFoundError(error_msg) 369 return out 370 FileNotFoundError: Unable to find 'https://huggingface.co/datasets/kili-technology/plastic_in_river/resolve/main/train.csv' submitted by /u/0ni0nrings [link] [comments]  ( 9 min )
    [D] How do byte-level language models work?
    I've recently been trying to pre-train my own small language model on the tiny-series datasets on huggingface. I also wanted to use a model similar to MEGABYTE but I don't understand how using bytes would work. The only implementation I could find from lucidrains used str(chr(max(32, token))) to decode any token (byte) to a character and put the embedding size as 256. Firstly, why 256 and not 256-32 as any values below 32 are ignored? Also, many byte-level models including this and ByteT5 mention that they can process any text sequence even in a multilingual setting, however how would that be true if we are only using one byte, would we have to move to 2 bytes or use an UNK token, and if we did use 2 bytes that would make our embedding size around 65000 which defeats sort of the point as o…  ( 10 min )
    [P] Evaluating and tuning a model when the population may change YoY and best practices for mitigating overfitting on features that correlate with time.
    Consider a predictive model that is predicting if an outcome Y will occur in Q1 2023, based on data from Q1 2022. Now, if want to predict outcomes for 2024, we must use last years data to build the model, but we are going to have some bias if there are features that vary year over year. Is the best approach in such a situation to try and tune/validate the model with other years in the hopes of mitigating any features that are correlated with a specific year? Any help would be much appreciated, as I can't find agreed upon methods. submitted by /u/unga123 [link] [comments]  ( 9 min )
    Is there a model to input anecdotal text stories as training data to return a more comprehensive story? [P]
    I have a goal and am looking for direction from others who know more than me about machine learning. I want to submit 5-10 pieces of text to a model. The text will be anecdotes from a common experience but each one from a different person’s perspective. For example, if a family visits a theme park, each family member will have a story or two about the day. Each family’s story would be a submission to the model. One person might have loved the roller coaster and can tell about the exciting parts. Another person maybe just can’t stop talking about how great he food was. Someone else maybe felt sick and complains the line at the bathroom was too long. Perhaps another family member also rode the same roller coasters as the first person but instead hated it, so would have a very different description of it than the first. All these anecdotes are submitted to the model. Then, the model can be queried. Such as, “Tell me about the theme park.” or “I love roller coasters. Tell me about the theme park.” or “I tend to overeat, tell me about the theme park.” (the model wouldn’t hype of the food, maybe it would talk about how much exercise the visitors get by walking around all day.) In this case of a theme park context, the model would have a preconception of a theme park. It would know the general concept, know of several examples or standards that it could compare this theme park against, understand it’s all for fun, etc. This type of model may be available as an API or model already and I just don’t know about it. That’d be fine, please point me towards it. Or, maybe there’s something already available but would need tweaked or customized. submitted by /u/Semper_Disco [link] [comments]  ( 10 min )
    [D] Help me learn ML easily specially in model building and EDA
    Can you give easy to understand sources and hands-on practice methodology to master ML? Help me understand build the models in and out . Thank you submitted by /u/the_mystic_1 [link] [comments]  ( 9 min )
    NSF workshop on LLMs in chemistry education [R]
    Over Feb 12-13 of 2024, the National Science Foundation (NSF) is sponsoring a workshop titled “Integrating LLMs into the Materials Chemistry Curriculum” in Golden, Colorado. We aim to explore and develop innovative ways to incorporate large language models (LLMs, e.g. GPT, ChatGPT, and Bard) into upper division chemistry laboratories and virtual lab experiences. During the workshop, participants will brainstorm and create demonstrations incorporating LLMs into the curriculum. The event will bring together folks across academia and the private sector with disciplinary backgrounds that range across chemistry, computer science, materials science, physics, and education. There is no registration fee, and we anticipate being able to cover the majority of participant travel costs thanks to NSF support. Participants early in their career (i.e., graduate students, postdoctoral scholars) are particularly encouraged to apply. If you are interested in participating in this workshop, please fill out the Google form (link below). Please feel free to distribute this invitation widely. Application: https://forms.gle/P9QdNiCuaUAHFZj29 submitted by /u/KC2792 [link] [comments]  ( 9 min )
    [P] Where to find projects to contribute to?
    Hello, I'm a developer with 6 years of experience in the mobile field, and I recently completed my master's degree in artificial intelligence (Text mining). I want to transition into the field of AI, but I need more experience with projects in the "real world," outside of academia, and I'd like to contribute to an open-source project. I looked on Github, but I ended up feeling confused and not sure where to start. P.S.: I did some research in this subreddit, but the posts about contributions seemed a bit dated. submitted by /u/Substantial_Fact_205 [link] [comments]  ( 9 min )
    [P] Image based Python + OpenCV automation, MMORPG Laghaim Auto-Fighter Bot Demo
    Video: https://youtu.be/0m12vkaoE7w ​ Detailed Medium post will follow in the upcoming days. https://medium.com/@pssdplayer submitted by /u/HistorianCrafty3514 [link] [comments]  ( 9 min )
    [D] - I have 20-30 million shopify products dataset, any ideas?
    I have collected over 20 million shopify products & had the following ideas for them: - LLM ( Finetune an llm to know how to speak ecom ) - Video bot that can make videos on those products, using their description, elevenlabs & AIFaceGen - EcomStore that will markup the products about 30% ( This will need the bot to frequently scrape, to ensure that the products are up to date ) - Selling the dataset based on fragments, like 1$ per 1k-10k records, depends on what sells. Please let me know if these are good ideas, and if someone would like to support / help me in any way ( I just need to selfhost my supabase instance, & add all the products to it & then dev can get started ) submitted by /u/AdonisCodes [link] [comments]  ( 9 min )
    [D] Best open-source AI model for QA generation from context
    As the title says I’m looking for an open-source AI model for generating question-and-answers with a correct answer option and explanation to the correct answer from the input context. So far I have tried these models, TheBloke/Llama-2-7B-GPTQ TheBloke/Llama-2-13B-GPTQ TheBloke/Llama-2-7b-Chat-GPTQ (the output is not consistent. Sometimes I get an empty response or without the correct answer option and an explanation data) TheBloke/Llama-2-13b-Chat-GPTQ (even 7b is better) TheBloke/Mistral-7B-Instruct-v0.1-GGUF(so far this is the only one that gives the output consistently. But not able to generate more than 2 QA due to max token limit of 512. Even tried setting the max token as 1024, 2048 but nothing helped) TheBloke/Mistral-7B-OpenOrca-GGUF NousResearch/Llama-2-7b-chat-hf My system configurations are: Windows 10 with 16GB GPU Additional Information: The input prompt token will be around 250-350 tokens per request. submitted by /u/gokulcv [link] [comments]  ( 9 min )
    Churn Prediction [R]
    I want to build a model to predict churn in a third party logistics company. What variables should make up my data? Any help would do. Thanks submitted by /u/DisastrousAd8814 [link] [comments]  ( 9 min )
    [D] Recommendations for CPU-Based Real-Time Vector Database Indexing and Matching?
    Hello everyone, I have a specific online vectorization use case: I'm looking to search the internet for articles, vectorize these articles along with the search queries, and then retrieve the most relevant passages from them. Currently, I have basic hosting through DigitalOcean. Could anyone recommend the most suitable vector dataset for this task? Additionally, considering my resources, is it feasible to run this system solely on CPUs? And if so, would this setup be scalable if deployed on CPUs only? submitted by /u/Traditional-Poet2746 [link] [comments]  ( 9 min )
    [R] network digital twin for cybersecurity
    Hi all, for a text work of mine I am trying to do a project based on generating digital twin of networks. My goal is to create a digital twin of a network and then work on it from a cyber security point of view. I will briefly explain what I would like to do. I am currently using software for network vulnerability scans (OpenVAS). I use this software to perform network vulnerability scans at the network level, so basically to OpenVAS I pass a network (for example 192.168.xx.xx/24) to automatically identify all the vulnerabilities that are there. The next step ( what I'd like to do and that's why I'm asking for your advice) is to create a digital twin of the newly scanned network and then perform a penetration test on this digital twin of the network, without going to stress the actual network. Ideally, I would like to pass the output of the OpenVAS vulnerability scans, routing rules, and firewall rules to some tool that will then generate for me the digital twin of the network, which will then be used for offensive cybersecurity, so exploits, privilege escalation, etc.... will be tested on this digital twin without worrying about breaking some kind of service or stressing the real network. What I am asking is, do you know of any tool that would do the trick for me? So some tool that allows me to generate a digital twin of a network by providing as input vulnerability scans (xml,json,csv etc...), routing rules, firewall rules, pcap traces etc... Do you have any references or documentation? Are you aware of any open source tools? I thank you for your helpfulness! ​ submitted by /u/Salt-Arugula-8128 [link] [comments]  ( 9 min )
    Best approach for VFX lineups using ML [Project]
    Quick intro Lineups are one of the first steps in the VFX pipeline Source: - orignal footage that was shot on set - a reference (quicktime) video from the film edit. Task: The reference shows modifications to the original footage. They can be : - timewarp (either fixed retimes like 200% speed or completely random) - transform (moved the image in x/y axis, rotation, scale, etc.) So the lineup task is to align the original footage to the reference quicktime. What I did so Far: Made a simple script in the software Nuke, using some Python and readily available tools to make it work on a simple shot. General logic is compare every frame and the associated one is the frame with the least difference between the two. This works on super simple and straightforward tasks. (can provide more info if needed). Issue: Some references are more heavily modified. They can have some muzzle flash, basic 3d objects or even some slight error introduced like a distortion applied to the image when none shouldn't so it will never be perfectly aligned. This makes the difference of the full frame higher for some frames, making the lineup wrong. (it will take the wrong frame that has no muzzle flash, because it has less difference...)Some other things to consider is that watermarks are covering the ref and the colors are not perfectly matching, can get them close enough, but there's a difference. Conclusion: Because of those issues, I'm thinking about using Machine Learning. I have next to no knowledge on the subject. I know there Is a bunch of ways to train a model, but no clue where to start, so here's my question : Which learning styles has the best potential to be able to solve this task? submitted by /u/Pretty_Customer_8113 [link] [comments]  ( 9 min )
    [R] What are some interesting research topics to study in the intersection of ML and signal processing currently?
    I will have to pick and start a research project next January for my final year. So wanted to start exploring now. I want to do something substantive and interesting enough to get published. submitted by /u/BadMeditator [link] [comments]  ( 9 min )
    [R] Mistral 7B
    submitted by /u/hardmaru [link] [comments]  ( 8 min )
    [R] Tsinghua University: Inverting Transformers Significantly Improves Time Series Forecasting
    Transformers are great at NLP and computer vision tasks, but I was surprised to learn they still lag behind simple linear models at time series forecasting. The issue is how most Transformer architectures treat each timestamp as a token and fuse all the variable data from that moment. This makes two big problems: Variables recorded at slightly different times get blurred together, losing important timing info Each token can only see a single moment, no long-term dependencies So Transformers struggle to extract useful patterns and correlations from the data. Some researchers from Tsinghua University took a fresh look at this and realized the Transformer components themselves are solid, they just need to flip the architecture for time series data. Their "Inverted Transformer" (or iTransformer): Makes each variable's full history into a token, instead of each timestamp Uses self-attention over variables to capture relationships Processes time dependencies per variable with feedforward layers This simple tweak gives all the benefits we want: State-of-the-art forecasting accuracy, beating both linear models and standard Transformers Better generalization to unseen variables Increased interpretability Ability to leverage longer historical context TLDR: Inverting Transformers to align with time series structure allows them to outperform alternatives in working with time series data. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [R] How to train multiple models on multiple GPU's simultaneously
    Hi! The task is to train N TensorFlow/Keras models using [2, ... N] GPU's on K different datasets in parallel. It is for testing a custom pipeline, you create a pipeline, you run it on multiple different datasets and get an aggregated metric. For now I'm using a for loop but how do I do it in parallel e.g. on AWS? I googled, but surprisingly haven't found a lot of results. I looked at Apache AirFlow because I'm vaguely familiar with it but so far I couldn't get a definite answer on how it works with multiple GPU's. Second option I found is to use Ray library. Is it worth trying? What should I use to solve this task? Thanks. UPD. I'd also consider a PyTorch solution as a backup option. UPDUPD. Jesus, why Reddit removing newlines after edit? submitted by /u/Disastrous_Sky9468 [link] [comments]  ( 9 min )
    [D] How important is having a great team when ML solutions are slow to be adopted ? When to move on?
    My team and managers are so easy to be with. Very grateful for that. The pay is okay. 150k/yr TC in Midwest. Hard for me to make a switch given how much I am appreciated. I almost feel spoiled when it comes to flexibility. I have overachiever tendency and the pace is so slow in adopting my ML models. I am the “lead”/senior data scientist in an R&D supporting scientists decision making with machine learning. Importantly, I am in a huge multinational consumer product company and I am not in the Data science organization, I bridge between the two and the data science expert on the team. I have developed the domain expertise and I have a PhD in an applied computational field with 5 years experience . I am not as challenged with getting deeper into complex stats, I have been really honing the soft skills of communication, influencing etc so getting comfortable in a senior role. Also I have been growing as a ML engineer building my own pipelines and deploying my models on prem server that they bought for me. I am not sure how greener it is on the other side, how do senior folks approach deciding when to move on? Any input is much appreciated. submitted by /u/Diligent_Trust2569 [link] [comments]  ( 9 min )
    [D] [P] [R] What to do when your model isn't testing well?
    I have 200k observations overall. I split my data into training and test set. My target variable has low prevalence ~ 9% so I tried random oversampling, random undersampling and SMOTE. After I fit my models, I tested them on my training test and the results were awful. I mean I've never had a model with 50% roc-auc, but then again, I rarely developed ML models. I'm wondering what the next steps would be? I understand there could be some sort of overfitting. But what would you do next? Any references would be appreciated :) submitted by /u/Actual-Muscle-9846 [link] [comments]  ( 9 min )
    [D] Fastest lipsync projects?
    Given an image, and an audio file (TTS generated), what is current fastest library that can output me a video of a talking image with the audio on it? I have made some research and I have seen Wav2Lip and SadTalker. Any better options? I am looking for processing speed and for the lesser hardware intensive solution for a side project. Thanks! submitted by /u/reddit2vid [link] [comments]  ( 9 min )
    [P] LoopQuest, A Github-like platform to host simulation environments for AI training
    Hello everyone! Here is my pet project, https://www.loopquest.ai/. I am trying to build a platform like Github to let people upload their simulation environments so people can train their AI agents by interacting with the environments created by others. Here is a 2-min demo, https://youtu.be/d53NFjkU7JA. It is not launched yet but would love to get some early feedbacks. Here is the corresponding Github repo https://github.com/LoopMind-AI/loopquest. For now, the package can log env-agent interaction data by adding one extra line of code. You can think of it similar to https://github.com/google-deepmind/envlogger but with much better backend and frontend support. Any feedbacks are appreciated :) submitted by /u/jxx123 [link] [comments]  ( 9 min )
    [D] Why async gradient update doesn’t get popular in LLM community?
    The pipedream-2bw paper and the Zero-offload paper both show that 1-step delayed asynchronous gradient update doesn’t affect the convergence (and perplexity) while improve the training efficiency (by fully utilize the bubbles in pipeline parallelism) at a large margin. However, both the Megatron-LM and the DeepSpeed don’t use pipedream-2bw scheduling. Could anyone share me some insights or ideas about why such an efficient scheduling scheme doesn’t get popular in the LLM pretraining community? Does it suffer convergence/accuracy issue in practice? Or are there any other concerns that blocking it become the default / most popular pipeline parallelism scheduling? (I posted the same question in hacknews as well: Why async gradient update doesn't get popular in LLM community? | Hacker News) I have tried to implement the pipedream-2bw scheduling scheme on Megatron-LM and do can reproduce the performance gain as well as loss convergence with GPT-2 345M using 8xV100 GPUs: https://github.com/sighingnow/Megatron-LM/blob/ht/dev-pipe/megatron/core/pipeline_parallel/schedules.py#L1421 submitted by /u/sighingnow [link] [comments]  ( 9 min )
    [D] IDE?
    What’s the best IDE to work with or is it on user needs that determines best fit or is their one top dog and dominator that can robustly if not better preform other IDE’s ? submitted by /u/External_Age_5855 [link] [comments]  ( 9 min )
  • Open

    Neural Networks From Scratch in Rust
    submitted by /u/zezeartix [link] [comments]  ( 8 min )
    Activation function for generating Shapley values
    Hi, I want to train a neural network to calculate Shapley values based on a given characteristic function. Depending on a given characteristic function, calculated through a dedicated algorithm, Shapley values can be any number, positive or negative, without a set range. Because of this, I am unsure, for the specific application of calculating Shapley values, what activation function to use in a neural network that would calculate them. The relu function, as well as leaky relu function, either cannot give values that are negative or have trouble giving large negative values, and sigmoid or tanh can only give values in a certain range. I am aware that there are other commonly used activation functions, but all the ones I could find had one of these issues, which would make training a network to calculate Shapley values difficult. Any advice? submitted by /u/PowNotBigSurprise [link] [comments]  ( 9 min )
    A hugging face implementation for style gan to produce user avatar
    I was thinking to create an app based on style gan which will include facebook , instagram theme and style transfer it with profile pic so shall i create this app or not .I want to know if it will be good idea. submitted by /u/No_Claim_8651 [link] [comments]  ( 9 min )
  • Open

    Improve performance of Falcon models with Amazon SageMaker
    What is the optimal framework and configuration for hosting large language models (LLMs) for text-generating generative AI applications? Despite the abundance of options for serving LLMs, this is a hard question to answer due to the size of the models, varying model architectures, performance requirements of applications, and more. The Amazon SageMaker Large Model Inference […]  ( 13 min )
    Index your web crawled content using the new Web Crawler for Amazon Kendra
    In this post, we show how to index information stored in websites and use the intelligent search in Amazon Kendra to search for answers from content stored in internal and external websites. In addition, the ML-powered intelligent search can accurately get answers for your questions from unstructured documents with natural language narrative content, for which keyword search is not very effective.  ( 7 min )
  • Open

    Python code for means
    The last couple article have looked at various kinds of mean. The Python code for four of these means is trivial: gm = lambda a, b: (a*b)**0.5 am = lambda a, b: (a + b)/2 hm = lambda a, b: 2*a*b/(a+b) chm = lambda a, b: (a**2 + b**2)/(a + b) But the arithmetic-geometric mean […] Python code for means first appeared on John D. Cook.  ( 5 min )
    More ways of splitting the octave
    in an earlier post I said that the arithmetic mean of two frequencies an octave apart is an interval of a perfect fifth, and the geometric mean gives a tritone. This post will look at a few other means. Intervals The harmonic mean (HM) gives a perfect fourth. The arithmetic-geometric mean (AGM) gives a pitch […] More ways of splitting the octave first appeared on John D. Cook.  ( 6 min )
    Maclaurin’s inequality
    This afternoon I wrote a brief post about Terence Tao’s new paper A Maclaurin type inequality. That paper builds on two classical inequalities: Newton’s inequality and Maclaurin’s inequality. The previous post expanded a bit on Newton’s inequality. This post will do the same for Maclaurin’s inequality. As before, let x be a list of real […] Maclaurin’s inequality first appeared on John D. Cook.  ( 5 min )
    Newton’s inequality and log concave sequences
    The previous post mentioned Newton’s inequality. This post will explore this inequality. Let x be a list of real numbers and define Sn(x) to be the average over all products of n elements from x. Newton’s inequality says that Sn−1 Sn+1 ≤ S²n In more terminology more recent than Newton, we say that the sequence […] Newton’s inequality and log concave sequences first appeared on John D. Cook.  ( 5 min )
  • Open

    Research Focus: Week of October 9, 2023
    Research Focus: Principal researcher Lester Mackey recognized for pioneering statistical and ML techniques; Pareto frontiers in neural feature learning; structural inequality in the influencer industry; new research on cardinality estimation. The post Research Focus: Week of October 9, 2023 appeared first on Microsoft Research.  ( 9 min )
  • Open

    Take the Wheel: NVIDIA NeMo SteerLM Lets Companies Customize a Model’s Responses During Inference
    Developers have a new AI-powered steering wheel to help them hug the road while they drive powerful large language models (LLMs) to their desired locations. NVIDIA NeMo SteerLM lets companies define knobs to dial in a model’s responses as it’s running in production, a process called inference. Unlike current methods for customizing an LLM, it Read article >  ( 6 min )
  • Open

    Gain and bias params in Mujoco
    Hi! I'm new to Mujoco and robot dynamics. When I read the Mujoco document, I'm confused about the gainprm and biasprm parameters. I want to understand the meaning of these parameters and tune the actuation speed of my actuator. An easy-to-understand explanation or supporting material would be appreciated. Thanks in advance. submitted by /u/UpperSearch4172 [link] [comments]
    LoopQuest, A Github-like platform to host simulation environments for AI training
    Hello everyone! Here is my pet project, https://www.loopquest.ai/. I am trying to build a platform like Github to let people upload their simulation environments so people can train their AI agents by interacting with the environments created by others. Here is a 2-min demo, https://youtu.be/d53NFjkU7JA. It is not launched yet but would love to get some early feedbacks. Here is the corresponding Github repo https://github.com/LoopMind-AI/loopquest. For now, the package can log env-agent interaction data by adding one extra line of code. You can think of it similar to https://github.com/google-deepmind/envlogger but with much better backend and frontend support. Any feedbacks are appreciated :) submitted by /u/jxx123 [link] [comments]

  • Open

    [D] On-Chain Reputation Model
    I am relatively new to machine learning, and I am thinking about building an on-chain reputation ML model. Here is how far I have gone in my ideation phase, can someone help with some suggestion on how I can approach this issue. Input data could include on-chain activity like number of transactions, value transferred, smart contracts interacted with, tokens held, NFTs owned, etc. Additionally, data from off-chain sources could be incorporated like identity verification, credentials, ratings, reviews, social media profiles, etc. Supervised learning algorithms like regression or classification models could be used to predict a reputation score. The target variable would be some verified reputation rating. Models like linear regression, random forests, or neural networks could work. Choice depends on size of data and complexity needed. Model would need to be transparent and parameters verifiable on-chain for validity. So linear models or simple neural networks may be most practical initially. The model could be trained off-chain initially but ultimately parameters and logic stored on-chain. Predictions could also be verified on-chain. Careful feature selection is important so the model relies on signals that are resistant to manipulation and capture true reputation. The model would need continuous updates as new data comes in reflecting latest reputation. This would require clear on-chain governance. Issues like privacy, collusion resistance, and censorship resistance would need to be addressed through crypto mechanisms like zero-knowledge proofs. P.S. This is a personal project I want to attempt to level up my ML skills. submitted by /u/AdParticular2891 [link] [comments]  ( 9 min )
    [D] Pivoting jobs to ML
    Hi everyone, I recently started a job as a Junior Data Engineer. I have learned a lot so far working with DBT, Snowflake, Looker, Jira workflow, and Git using SQL and Python. I plan to stay at this company for 2 years. My boss has assured me that if I work hard I will progress from a Junior to full Data Engineer. After 2-3 years as a DE, I want to level up and move towards Data Science/ ML roles. My questions are: What other skills should I learn to enable me to pivot into something ML related? Should I find a job as a Data Scientist first, then try for ML jobs? Just looking for some advice/suggestions. Thanks! submitted by /u/SydeFxs [link] [comments]  ( 9 min )
    Problem solving in programming [D]
    Hello Redditors, I am a student who is currently studying Bachelor of Science in AI. I have a question regarding improving my coding skills. I am aiming for a research internship and I don't know where to start. I previously took a summer school that taught me a lot about state-of-the-art models such as GANs, Transformers, VAEs, GNNs, etc. I would like to improve my coding skills, specifically problem-solving and writing clean code. I have experience with deep learning in general and data analysis. I am looking for a research internship next summer. Where should I start? I plan to review some of the deep learning material in the Deep Learning Specialization before taking the GAN specialization. However, when it comes to coding, I want to think like a software engineer or a great programmer. What do you guys suggest for improving my coding or problem-solving skills? I'm feeling confused with multiple resources and I don't know where to begin. I’d really appreciate your help. submitted by /u/misplacedlion [link] [comments]
    Random forest trained on insider trades [D]
    Would be very appreciative if someone looked at these results and pointed out potential / actual flaws. Dataset basics: insider trade details, insider trades over the last month, insider trades over the last week, (…) stock return over the last month (…), 46 columns total. Labels… 0: -5% + 5% Dates predicted: reported date. Usually 2-3 days behind transaction. Also, not positive if results are significant in the first place so that would be a great call out as well. Colab notebook: https://colab.research.google.com/drive/1fO1hVsVMWN3TORNj4OQn5UbWQOeug4fi?usp=sharing submitted by /u/This_Cardiologist242 [link] [comments]  ( 9 min )
    [R] ALMT: Using text to narrow focus in multimodal sentiment analysis improves performance
    Multimodal sentiment analysis combines text, audio and video to understand human emotions. But extra inputs can add irrelevant or conflicting signals. So filtering matters. Researchers made a "Adaptive Language-guided Multimodal Transformer" (ALMT) that uses text to guide filtering of visual and audio data. This creates a "hyper-modality" with less noise that complements the text. They tested it on datasets like MOSI (YouTube reviews), MOSEI (YouTube clips) and CH-SIMS (Chinese videos). ALMT achieved improved accuracy: MOSI: YouTube movie reviews with 2,199 samples. ALMT achieves state-of-the-art performance on various metrics including 6% higher 7-class accuracy. MOSEI: 22,856 YouTube clips covering sentiment-rich scenarios. ALMT improves multi-class accuracy by 3-5% over previous methods. CH-SIMS: Chinese dataset with over 2,000 video samples. ALMT surpasses prior work by 1.4% in binary accuracy. Analyses showed big drops in performance without the guided filtering, so this validates that it's the main innovation. Downsides are it needs lots of training data and has minor gains on sparse regression metrics. But overall the technique of filtering multimodal data under text guidance gives improvements. The concepts feel intuitive - use dominant signals to filter others and retain useful complements. My guess is it would transfer well to other multimodal tasks. TLDR: New way to filter multimodal data for sentiment analysis using text guidance improves performance. Shows the value in removing distracting signals. Sometimes less is more. Full summary here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    Has anyone evaluated Tiktok's algorithm for their recsys use case? [D]
    As a disclaimer, I am not familiar with many Recsys benchmarks. So I know Tiktok published a white paper on their purported algorithm, Monolith, but it is unclear if that is what they use in their products or not. Given, recommender systems seem to be core to Bytedance's business, I imagine they wouldn't provide many details. Has anyone evaluated Monolith on their own products and seen an improvement? I think the app is impressive and am wondering how it has transferred to other use cases. ​ submitted by /u/HybridRxN [link] [comments]
    [P] Optimistix, nonlinear optimisation in JAX+Equinox!
    Hi everyone! I wanted to advertise my new JAX optimisation library Optimistix! Optimistix has high-level APIs for minimisation, least-squares, root-finding, and fixed-point iteration and was written to take care of these kinds of subroutines in Diffrax. Here is the GitHub: https://github.com/patrick-kidger/optimistix The elevator pitch is Optimistix is really fast, especially to compile. It plays nicely with Optax for first-order gradient-based methods, and takes a lot of design inspiration from Equinox, representing the state of all the solvers as standard JAX PyTrees. For those familiar with classical nonlinear unconstrained optimisation, Optimistix does some pretty nifty new things. It introduces new abstractions for modular optimisers, allowing users to mix-and-match different optimisation techniques easily. For example, creating a BFGS optimiser with Levenberg-Marquardt style Tikhnov regularisation takes less than 10 lines of code in Optimistix. I'm using Optimistix as a tool for my own research, and continue to work on it as part of my PhD (supervised by Patrick Kidger.) I would love for some more people to try it, so let me know what you think! submitted by /u/packquickly [link] [comments]  ( 9 min )
    [D] Document layout - recreating the structure
    Hello, Document layout analysis has been a great tool so far to extract the components of a document (title, paragraph, tables ...). I'm working on long text PDF which are mostly scanned documents. One of the process involved after document layout analysis, is to recreate the document structure: creating sections, sub section, sub sub sections and so on. As of today, this task is done by parsing the title and finding out any ordering information (numeric, alphabetical or roman notation): 1. Title A 1.1 Title B 2. Title C 2.a) Title D This technique works only if a document follows this constraint (numeration). I want to go one step further, where the algorithm could create the document structure with any title ordering information. I believe that relying only on parsing cannot do the trick. What could be the options? Given that the only features are: title's text and title's position (x,y) in the document. I was wondering if a model like a seq2seq could fit this problem, or should I stick with an engineering rule based approach. Thanks ​ submitted by /u/mathrb [link] [comments]  ( 9 min )
    [R] Is there an enstablished method to test if something has been memorized / seen by black-box LLMs?
    I am using ChatGPT and other LLMs for which the training data is unknown. I am using them to test a set of MC question from a medical test published after the models knowledge cutoff. However, I cannot be 100% sure the questions were not on the internet beforehand. Is there any established method or testsuit to try to understands weather a given instance has been seen at training time? All I can think is looking at memorization or at perplexity, but I was looking for a more out of the box methodology that people use. It seems to me that the problem is quite general. Thanks! Edit: I know LLMs do not just memorize things and learn pattern. However, there is research on trying to understand if a datapoints has been used in training or not. Eg there is research that tries to exploit the fact that seen text has normally lower perplexity than unseen text or other similar infornation. I was wonderibg what the state in this topic is and if something is normally used as a score to have some clues. I do not expect to be able to retrieve the exact same questions lol submitted by /u/ombelicoInfinito [link] [comments]  ( 9 min )
    [D] Extracting Multi-modal embeddings (Image + text) to be used for visual similarity purposes
    I am looking for methods/frameworks to extract multi-modal embeddings from images and text for similarity search purposes. The problem setup is slightly different from how CLIP style methods are generally used ( where similarity between text and image embeddings obtained through the model are computed to assess how similar a caption is to an image). My intended application is similarity search, where I want to find entries of images and captions pair similar to a piece of the query image and caption encoded together. Some approaches I tried: I tried concatenating the textual and visual embeddings obtained from CLIP and ResNET with textual embeddings and using it with cosine similarity, but it had limited utility. My guess is that concatenating two modalities merely without any training would yield very little utility. The next direction could be to train a model to fuse the embeddings obtained, but my dataset size is really small (10 thousand total), so not sure if training a model would be helpful. Are there any approaches that can allow me to combine the multi-modal embeddings for similarity purposes, similar to how pre-trained ResNET or Inception can be used off-the-shelf for retrieving visually similar images? Any pointers/advice would be greatly appreciated. submitted by /u/No-Commission3556 [link] [comments]  ( 9 min )
    [P][D] Building Datasets
    In my ML/AI journey up until now, most training and hands-on labs either use a pre-built dataset or have you build a pretty simple and flat dataset. I am now looking to stretch my exploration into some real-world use cases and find the data I want is way more complex. Researching online feels like the meme on learning to draw an owl. So I'm looking for some guidance on how to handle my data. The data is an array from a rest API that includes all alarms from an application as nested objects. So the data looks like this for a single event: data = { "event_data": [ { "root_cause": "Root cause added after API calls" "alarms": [ { "alarm_id": "alarm_id", "alarm_name": "alarm_name", "alarm_type": "alarm_type", "alarm_description": "alarm_description", "alarm_details": { pro1: val1, prop2: val2, etc... }, "actual_alarm_value": { any_random_key: "any_random_value", etc... }, } ], } ] } I need to build a dataset that includes many of these events with the ultimate goal of predicting future events. I plan to test this against various ML models and LLMs. Each event would be a single row, and I would flatten out each alarm so each nested property has its own column. Where I need clarification is how to handle the flatting of alarms. If I fully flatten them, it appears like I lose the context of the alarm's parent event. But if I only flatten them to the alarm level, I lose each property having its own row Also actual_alarm_value is very random, so my thinking is to use string encoding here. I know this is a lot of detail, and I appreciate any and all advice and help in learning how to do this. submitted by /u/that1guy15 [link] [comments]  ( 9 min )
    [D] Is there a REST API for text embeddings?
    I'm aware there are commercial offerings like OpenAI and cohere with the embedding API. But what about for open source models like the ones from SentenceTransformers? I'm aware you can use the HuggingFace inference API, but it's probably not best for commercial use, in which case the Inference endpoints would be better, but it's quite pricey for a startup with no customers. I also know I could use some kind of serverless GPU / inference platform to create my own API. But is there just a straight-up REST API for getting text embeddings from a model via SentenceTransformers or other HuggingFace models? submitted by /u/TheSaasDev [link] [comments]  ( 9 min )
    [D] Langauge Confusion.
    I am a Second Year Student I'm planning to start learning ML which obviously requires python. But at the same time I wanna start practicing DSA / competitive programming as well. I'm sorta in this dilemma of what to do. Since python is a must for ML I'm 100% doing it, but for DSA I am confused whether I should learn DSA in Python or C++. People say C++ is the best and ideally I should do that. But python suits my need more. Obviously I don't mind doing both languages together but it seems a bit redundant. P.S: I'm learning DS basics in college via C language so learning the basic concepts isn't an issue. What do you suggest? submitted by /u/No-Discipline-2354 [link] [comments]
    [Project] I created a tool that navigates the Internet and scrapes data using GPT-4
    Hi! I created a universal data API that uses headless browsers and GPT to extract any data from the web in JSON format. I started this project because I needed some API to do data enrichment to get company data (headcount, investment rounds, etc.). Once I did the first version, I quickly realized that there can be many use cases for such a tool: data enrichment, web scraping, data validation, etc. You can get the early access to the API here: https://singleapi.co/ Thanks! submitted by /u/semanser [link] [comments]  ( 9 min )
    Applied AI/ML/ Data Science MS in Germany [D]
    Hey folks, I graduated from a tier 2 college in India with an ECE degree and then started working as an ML engineer in a mid-size startup 2 years ago. (1 year of internship + 1 year of Full time employment at the same company). Now, I am looking to get a Master's Degree in AI/ML/DS in Germany starting Winter 2024. I am a person with interests in Industry skills(Applied AI/ML) rather than the research/academia part as I don't wish to pursue a PhD nor do I want to be stuck in a Math-deep subject that may not be relevant for me in the future. On account of this, I wanted to know which college/degree offers the best balance in-between theory and applied AI/ML/DS. Also, people have been telling me that exams are super tough and it is hard to successfully complete an AI/ML/Data Science MS degree in Germany, Is it true? It has been super discouraging for me to hear this and is affecting me mentally to go through the application process. PS. CS/Electrical Degrees with good electives for AI/ML/DS are also good enough for me (Just hoping the coursework/grading is not too harsh) Also, it would be great if someone could clarify if an Electronics and Communications student can apply for a CS degree in Germany. Sorry for asking too many questions, TIA. :) submitted by /u/TheDivineKnight01 [link] [comments]  ( 9 min )
    [D] Prompting as searching through a space of vector programs
    Enlightening article from Francois Chollet about #LLMs and embeddings "Prompt engineering is the process of searching through program space to find the program that empirically seems to perform best on your target task." ​ https://fchollet.substack.com/p/how-i-think-about-llm-prompt-engineering submitted by /u/alexisperrier [link] [comments]  ( 9 min )
    [D] Best approach to verify 4 million sentence-named entity pairs ?
    I have a dataset of about 4 million pairs of sentence-named entity. Looks like this: Sentence: MarketWatch has reached out to Charles Schwab and GQG for comment. Corresponding NER Tags: [{'end': 6, 'entity': 'B-ORG', 'index': 1, 'score': '0.98322886', 'start': 0, 'word': 'Market'} {'end': 7, 'entity': 'I-ORG', 'index': 2, 'score': '0.969261', 'start': 6, 'word': '##W'} {'end': 11, 'entity': 'I-ORG', 'index': 3, 'score': '0.97644824', 'start': 7, 'word': '##atch'} {'end': 38, 'entity': 'B-PER', 'index': 8, 'score': '0.9927636', 'start': 31, 'word': 'Charles'} {'end': 41, 'entity': 'I-PER', 'index': 9, 'score': '0.99394774', 'start': 39, 'word': 'Sc'} {'end': 44, 'entity': 'I-PER', 'index': 10, 'score': '0.41437265', 'start': 41, 'word': '##hwa'} {'end': 45, 'entity': 'I-PER', 'index': 11, 'score': '0.46933985', 'start': 44, 'word': '##b'} {'end': 51, 'entity': 'B-ORG', 'index': 13, 'score': '0.9984176', 'start': 50, 'word': 'G'} {'end': 52, 'entity': 'I-ORG', 'index': 14, 'score': '0.99367344', 'start': 51, 'word': '##Q'} {'end': 53, 'entity': 'I-ORG', 'index': 15, 'score': '0.99617106', 'start': 52, 'word': '##G'}] What would be a good approach to verify the correctness of each item? submitted by /u/shardblaster [link] [comments]  ( 9 min )
  • Open

    MusicGPT: Create unique music from text prompts
    submitted by /u/SaucySporky [link] [comments]
    Website to do the Following: I Give it a Design and Create an Image With it
    Hello all, I am not sure this is out yet. I would like to find a website where i can upload an image I own, and have it generate another image around it. Let's say I have some shirts that say 'HOLA'. I would want, for example, to generate an image of Socrates wearing said shirt. Is this possible? If so, which site would allow me to do this? ​ Cheers and merci! submitted by /u/JYanezez [link] [comments]
    So far, AI hasn't been profitable for Big Tech
    Big Tech companies like Microsoft and Google are grappling with the challenge of turning AI products like ChatGPT into a profitable enterprise. The cost of running advanced AI models is proving to be a significant hurdle, with some services driving significant operational losses. Corporate customers are unhappy with the high running costs of AI models. The nature of AI computations, which require new calculations for each query, makes flat-fee models risky. Some companies are trying to dial back costs, while others continue to invest more deeply in AI tech. Microsoft's GitHub Copilot, which assists app developers by generating code, has been operating at a loss despite attracting more than 1.5 million users. One of the reasons AI services are costly is that some companies have been reaching for the most powerful AI models available. Microsoft has been exploring less costly alternatives for its Bing Chat search engine assistant. Advances in AI acceleration hardware may eventually reduce the costs of operating complex models. Experts anticipate a more stringent financial approach in the near future, transitioning from experimental budgets to focusing on profitability. Source : https://arstechnica.com/information-technology/2023/10/so-far-ai-hasnt-been-profitable-for-big-tech/ submitted by /u/NuseAI [link] [comments]
    Dubbing By ElevenLabs. Share your fav videos in your native language!! Go try
    submitted by /u/ShooBum-T [link] [comments]
    The environmental impact of the AI revolution is starting to come into focus
    The environmental impact of the AI revolution is starting to become clear, with generative AI like ChatGPT increasing Google Search's energy use more than tenfold. The worry is that the computing power required for AI could lead to increased energy consumption and carbon footprint of data centers. AI already accounted for 10 to 15 percent of Google's electricity consumption in 2021. Google claims that the energy needed to power AI technology is increasing at a much slower rate than predicted, and they are implementing practices to reduce the carbon footprint of AI workloads. The worst-case scenario of Google Search using as much electricity as Ireland is unlikely, but the potential energy consumption of AI servers could grow significantly if AI's popularity continues to rise. Source : https://www.theverge.com/2023/10/10/23911059/ai-climate-impact-google-openai-chatgpt-energy submitted by /u/NuseAI [link] [comments]
    ALMT: Using text to narrow focus in multimodal sentiment analysis improves performance
    Multimodal sentiment analysis combines text, audio and video to understand human emotions. But extra inputs can add irrelevant or conflicting signals. So filtering matters. Researchers made a "Adaptive Language-guided Multimodal Transformer" (ALMT) that uses text to guide filtering of visual and audio data. This creates a "hyper-modality" with less noise that complements the text. They tested it on datasets like MOSI (YouTube reviews), MOSEI (YouTube clips) and CH-SIMS (Chinese videos). ALMT achieved improved accuracy: MOSI: YouTube movie reviews with 2,199 samples. ALMT achieves state-of-the-art performance on various metrics including 6% higher 7-class accuracy. MOSEI: 22,856 YouTube clips covering sentiment-rich scenarios. ALMT improves multi-class accuracy by 3-5% over previous methods. CH-SIMS: Chinese dataset with over 2,000 video samples. ALMT surpasses prior work by 1.4% in binary accuracy. Analyses showed big drops in performance without the guided filtering, so this validates that it's the main innovation. Downsides are it needs lots of training data and has minor gains on sparse regression metrics. But overall the technique of filtering multimodal data under text guidance gives improvements. The concepts feel intuitive - use dominant signals to filter others and retain useful complements. My guess is it would transfer well to other multimodal tasks. TLDR: New way to filter multimodal data for sentiment analysis using text guidance improves performance. Shows the value in removing distracting signals. Sometimes less is more. Full summary here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]
    AI Political Analysis Test: Describe the commonalities between the Palestinians fighting for a homeland and the American Revolution.
    The Palestinians fighting for a homeland and the American Revolution have a number of commonalities, including: Fighting against a colonial power:** Both the Palestinians and the American colonists were fighting against a colonial power that had imposed its rule on them. In the case of the Palestinians, the colonial power is Israel, which was established in 1948 on land that had been inhabited by Palestinians for centuries. In the case of the American colonists, the colonial power was Great Britain. Fighting for self-determination:** Both the Palestinians and the American colonists were fighting for their right to self-determination, or the right to govern themselves. The Palestinians want to establish their own independent state, while the American colonists wanted to break away from Gr…
    I made "Pi: your personal IA" to have an opinion.
    submitted by /u/LonePrron [link] [comments]
    IBM CEO: Washington should hold tech firms accountable for AI
    submitted by /u/smo279 [link] [comments]
    Automated my Youtube Channel Using GPT 4
    Hi Everyone, I have automated the content creation for my youtube channel. It got total views of 8.5K and some videos getting 2.5K views. https://www.youtube.com/channel/UCG0-UemyRMUs1JJlQMK9lzA All the things are automated like:- Script Generation Voiceover Image Generation Subtitles I do minor tweaks here and there but majorly its automated. I posted is somwhere and people were commenting what's the use of the mindless videos? This is the begining, I want to automate the editing of videos. User can upload raw videos and I should be able to give multiple final edit videos. I have built a small tool blinkcuts.com, If anyone intersted. I can give access. Please DM for access. submitted by /u/raxrb [link] [comments]
    Saudi-China collaboration raises concerns about access to AI chips
    Saudi-China collaboration raises concerns about access to AI chips. The trial period includes complete digital access to FT.com with everything in both the Standard Digital and Premium Digital packages. At the end of the trial, users will be auto-enrolled in the premium digital monthly subscription plan for $69 per month. Payment can be made through credit card, debit card, or PayPal. Source : https://www.ft.com/content/2a636cee-b0d2-45c2-a815-11ca32371763 submitted by /u/NuseAI [link] [comments]
    Looking for the free AI tool which removed the noise from the video:
    Hey, I am looking for the free AI tool which removed the noise from the video. If there is any, do suggest. Thank You in Advance. submitted by /u/Haziq12345 [link] [comments]
    Looking for the free AI tool which removed the noise from the video:
    Hey, I am looking for the free AI tool which removed the noise from the video. If there is any, do suggest. Thank You in Advance. submitted by /u/Haziq12345 [link] [comments]
    How do AI-driven demand forecasting models handle market volatility and unexpected events, such as economic crises or pandemics?
    If you have any resources then do share. submitted by /u/Cygnet-Digital [link] [comments]
    AI Power Distribution Scenarios.
    submitted by /u/Philipp [link] [comments]
    As drone traffic increases, researchers turn to AI to help avoid collisions
    submitted by /u/Tao_Dragon [link] [comments]
    Is this a viable approach for a small plant manufacturing engineer?
    I'm a small plant engineer who covers manufacturing, process, quality, and new product design. I wear many hats in my job and it's a lot of responsibility. One way I've attempted to tame the complexity is by using good reference books. I've accumulated quite the collection through the years. Some print others digital. I've also got a lot of digital notes. And that's a lot of data. I've been playing around with sharly.ai (thanks to this sub for recommending) and uploading documents to it and querying them. Its been able to find the information every time it's been available. And more importantly it's provided sources and page numbers. This is important, since I've never been able to find a conversational AI that gives me consistently good answers (including the latest chatgpt), and I always need to read deeper. I also need to backup my work. So in this way it's basically a super index. I also bought a tablet for note-taking and basic sketches. The idea is to use the tablet to take notes, hold my library for reading, and interact with sharly.ai. Is this approach good enough, or is there something else I can do? submitted by /u/Aggressive_Ad_507 [link] [comments]
  • Open

    Issue with MuJoCo Simulation: Robot Penetrates the Ground
    Hello everyone, I'm working on simulating a modified humanoid robot, "DARwIn OP 3", using MuJoCo through dm_control in Python. My goal is to train the model to ascend stairs rapidly but these are the first steps. However, I've encountered a problem where the robot appears to sink into the ground and is then ejected with significant force under specific conditions. ​ https://reddit.com/link/174vpzw/video/u636vf49tftb1/player Environment: MuJoCo via dm_control. Issue Description: When the robot falls and its feet move, it behaves as though one of its motors sinks into the floor. Attempts: I've tweaked contact parameters and ground properties with no luck. Interestingly, this doesn't occur in the standalone MuJoCo simulator. Visual Aid: I've attached a video to illustrate the problem…
    Algorithms for average reward reinforcement learning in continuous/general state-action space
    I see that discounted reward reinforcement learning has been extensively studied in the literature. However, the average reward metric receives less attention, and it looks like algorithms for this metric (R-learning, H-learning, SMART, etc.) are much less than the discount metric. Could you suggest any algorithms for average reward reinforcement learning for continuous/general state-action space? submitted by /u/S1gnature [link] [comments]
    "How Disney Packed Big Emotion Into a Little Robot" (sim2real)
    submitted by /u/gwern [link] [comments]
    I took OpenAI's paper about defeating Dota2 world champions, and explained it paragraph-by-paragraph.
    submitted by /u/mngrwl [link] [comments]
    What's your view on the recent RT-X efforts/scaling via IL?
    With recent RT-X efforts from Deepmind, it seems the community has been shifting towards the development of a more generalized foundational model, combining with visions and languages, and scaling via imitation learning. I know RL algorithms are expensive to train and hard to scale due to the way the samples are generated, but I am still fascinated by the intelligence behind their philosophies. What do you think the future would look like? Like NLP or CV, having a big foundational model pre-trained via IL, and fine-tune on different tasks via RL? How can we tell if a task is simple enough that we don't need to leverage the power of a foundational model? submitted by /u/Old_Reading_669 [link] [comments]
  • Open

    U statistics and a new paper by Terence Tao
    Terence Tao has a new paper out that relates to a couple things I’ve written about recently. Elementary symmetric polynomials came up when developing the general equations for tangent sum and hyperbolic tangent sum. The latter post goes into more detail. Before that, means of symmetric functions, not necessarily elementary polynomials or even polynomials, came up […] U statistics and a new paper by Terence Tao first appeared on John D. Cook.  ( 5 min )
    Detecting fraud with the GRIM test
    The latest episode of Erik Seligman’s podcast is entitled The Grim State of Modern Pizza. Although you might not realize it from the title, the post is about fraud detection. GRIM stands for Granularity-Related Inconsistency of Means. In a nutshell, the test looks for means (averages) that are not possible on number theoretic grounds. If […] Detecting fraud with the GRIM test first appeared on John D. Cook.  ( 5 min )
    Tritone
    A few weeks ago I wrote about how the dissonance of a musical interval is related to the complexity of the frequency ratio as a fraction, where complexity is measured by the sum of the numerator and denominator. Consonant intervals have simple frequency ratios and dissonant intervals have complex frequency ratios. By this measure, the […] Tritone first appeared on John D. Cook.  ( 6 min )
    When a function cannot be extended
    The relation between a function and its power series is subtle. In a calculus class you’ll see equations of the form “series = function” which may need some footnotes. Maybe the series only represents the function over part of its domain: the function extends further than the power series representation. Starting with the power series, […] When a function cannot be extended first appeared on John D. Cook.  ( 5 min )
  • Open

    DSC Weekly 10 October 2023
    Announcements Top Stories In-Depth The post DSC Weekly 10 October 2023 appeared first on Data Science Central.  ( 20 min )
    How to ensure data security when sharing business-critical information
    Introduction  In an era where data is often termed the ‘new oil,’ its security holds unparalleled importance for businesses across industries. With the proliferation of digital platforms, sharing business-critical information has become routine yet perilous. From financial records to customer data, organizations frequently exchange sensitive information that, if compromised, could have dire consequences. Given the… Read More »How to ensure data security when sharing business-critical information The post How to ensure data security when sharing business-critical information appeared first on Data Science Central.  ( 21 min )
    How does combining blockchain and AI create new business opportunities?
    Gartner predicts blockchain’s economic impact to reach $176 billion by 2025 and $3.1 trillion by 2030. The AI software market is expected to reach $134.8 billion by 2025. Blockchain and AI benefit businesses. AI models process data, extract insights, and make decisions. Blockchain ensures data integrity and trust among participants. Read on to discover the… Read More »How does combining blockchain and AI create new business opportunities? The post How does combining blockchain and AI create new business opportunities? appeared first on Data Science Central.  ( 22 min )
    Understanding the difference: Data analyst, data scientist, and data engineer
    In the contemporary digital landscape, data has emerged as a critical asset for organizations aiming to make informed decisions and foster innovation. Data analytics can unlock a treasure trove of insights, driving competitive advantage and operational excellence by leveraging the vast amounts of data generated every second. As a consequence, the demand for skilled professionals… Read More »Understanding the difference: Data analyst, data scientist, and data engineer The post Understanding the difference: Data analyst, data scientist, and data engineer appeared first on Data Science Central.  ( 24 min )
    11 Questions Every CEO Should Ask about AI / Generative AI
    I’ve been in this industry for over 40 years (yes, I just started in the data and analytics industry when I was 11), and I have NEVER seen anything like Artificial Intelligence (AI) and Generative AI (GenAI) capture the attention of CEOs (and the dystopic fear of everyone else). Is AI a game-changer?  Definitely!  Will… Read More »11 Questions Every CEO Should Ask about AI / Generative AI The post 11 Questions Every CEO Should Ask about AI / Generative AI appeared first on Data Science Central.  ( 23 min )
  • Open

    New – No-code generative AI capabilities now available in Amazon SageMaker Canvas
    Launched in 2021, Amazon SageMaker Canvas is a visual, point-and-click service that allows business analysts and citizen data scientists to use ready-to-use machine learning (ML) models and build custom ML models to generate accurate predictions without the need to write any code. Ready-to-use models enable you to derive immediate insights from text, image, and document […]  ( 7 min )
    Whisper models for automatic speech recognition now available in Amazon SageMaker JumpStart
    Today, we’re excited to announce that the OpenAI Whisper foundation model is available for customers using Amazon SageMaker JumpStart. Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680 thousand hours of labelled data, Whisper models demonstrate a strong ability to generalize to many datasets and domains without the need […]  ( 11 min )
    Reinventing a cloud-native federated learning architecture on AWS
    In this blog, you will learn to build a cloud-native FL architecture on AWS. By using infrastructure as code (IaC) tools on AWS, you can deploy FL architectures with ease. Also, a cloud-native architecture takes full advantage of a variety of AWS services with proven security and operational excellence, thereby simplifying the development of FL.  ( 12 min )
  • Open

    MAXimum AI Performance: Latest Adobe Updates Accelerated by NVIDIA GPUs Improve Workflows for Millions of Creatives
    Generative AI is helping creatives across many industries bring ideas to life at unprecedented speed. This technology will be on display at Adobe MAX, running through Thursday, Oct. 12, in person and virtually.  ( 9 min )
  • Open

    Riddle me this: Issues when predicting a high frequency sine wave
    Hi folks, I have observed a strange behavior when implementing a VERY BASIC idea 🙂 I want to use a fully-connected Neural Network to approximate a sine wave. For that I am sampling 200.000 uniformly distributed points from a wide interval, e.g. [-60,60] and compute the corresponding sin(x) values resulting in the following training data. ​ Training data I glimpse into my setup: Model: nn.Linear(1, 16) nn.Sigmoid() Linear(16, 16) nn.Sigmoid() nn.Linear(16, 8) nn.Sigmoid() nn.Linear(8, 4) nn.Sigmoid() nn.Linear(4, 1) (I also pumped up the network to up to 100 hidden neurons on one layer) Number of samples: 200.000 (80% train / 20% test) Optimizer: Adam Loss: RMSE Epochs between 100 - 500 Learning Rate: 0.02 Batch Size: 500 - 1000 ​ Check out the screenshots below to see the results 😨 ​ The predictions are pretty good but the edge areas slow down to a very small value, without any change. This only holds for high-frequency sine waves. If we only consider the train range of [-2*np.pi , 2*np.pi] it works pretty good with small loss. ​ So my questions are: 1) Why do we see that behaviour? 2) How can we solve it ​ Cheers ​ Prediction 1 ​ Prediction 2 submitted by /u/CarKla [link] [comments]

  • Open

    [R] Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models - University of Illinois 2023 - Achieves 94.4\% for programming on HumanEval with GPT-4 and 86.9\% with GPT-3.5 20\% better than with reflexion!
    Paper: https://arxiv.org/abs/2310.04406 Abstract: While large language models (LLMs) have demonstrated impressive performance on a range of decision-making tasks, they rely on simple acting processes and fall short of broad deployment as autonomous agents. We introduce LATS (Language Agent Tree Search), a general framework that synergizes the capabilities of LLMs in planning, acting, and reasoning. Drawing inspiration from Monte Carlo tree search in model-based reinforcement learning, LATS employs LLMs as agents, value functions, and optimizers, repurposing their latent strengths for enhanced decision-making. What is crucial in this method is the use of an environment for external feedback, which offers a more deliberate and adaptive problem-solving mechanism that moves beyond the limitations of existing techniques. Our experimental evaluation across diverse domains, such as programming, HotPotQA, and WebShop, illustrates the applicability of LATS for both reasoning and acting. In particular, LATS achieves 94.4\% for programming on HumanEval with GPT-4 and an average score of 75.9 for web browsing on WebShop with GPT-3.5, demonstrating the effectiveness and generality of our method. https://preview.redd.it/ail2c1kbh9tb1.jpg?width=857&format=pjpg&auto=webp&s=a89d1f4ce3c536eecda3f7ab6027f304286f6c81 https://preview.redd.it/j8xzx1kbh9tb1.jpg?width=1655&format=pjpg&auto=webp&s=c791756af926c7d472313b212de765e74c2b75da https://preview.redd.it/t47ne1kbh9tb1.jpg?width=1362&format=pjpg&auto=webp&s=560e5dd82ad06fdb729ab8ea1434c98e5c1a2ed3 https://preview.redd.it/r58es3kbh9tb1.jpg?width=1341&format=pjpg&auto=webp&s=d5681992547dd6248ade5729c545eb17e824b7ea https://preview.redd.it/7viy42kbh9tb1.jpg?width=1496&format=pjpg&auto=webp&s=6454cfe65b511b34771cd510f67775be4e01c636 ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [R] looking for in-depth tutorials and papers on NN pruning
    I only started working with neural nets a year ago and i've been having trouble understanding how pruning actually works. If there's any resources you think might help please guide me to them. thanks! submitted by /u/Sidekiiick02 [link] [comments]  ( 9 min )
    [D] Feature selection for multivariate time series model
    Say for a sample that you have 5 target variables and 30 exogenous variables. If you want to include no more than 10 exogenous variables to your time series forecast, because of overfitting issues and such, what feature selections would you apply? Could you use pca and vif for multivariate models or are there other approaches to consider? submitted by /u/AdWhole1559 [link] [comments]  ( 9 min )
    [R] ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to Scale
    Title: ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to Scale Paper: https://arxiv.org/abs/2310.01217 Code: https://github.com/CPJKU/ScaLearn https://preview.redd.it/xvcz7obtc8tb1.jpg?width=2020&format=pjpg&auto=webp&s=26169fa234e4e714d424ce17a7f0fa2c513fc42c Abstract: Multi-task learning (MTL) has shown considerable practical benefits, particularly when using pre-trained language models (PLMs). While this is commonly achieved by simultaneously learning n tasks under a joint optimization procedure, recent methods such as AdapterFusion structure the problem into two distinct stages: (i) task learning, where knowledge specific to a task is encapsulated within sets of parameters (e.g., adapters), and (ii) transfer, where this already learned knowledge is lev…  ( 9 min )
    [D] What is more valuable 10k CPUs or 1k GPU hours?
    Hello ML community! I recently built, incredibly simple to learn, cluster compute software. Users can (in https://www.burla.dev/ submitted by /u/Ok_Post_149 [link] [comments]  ( 9 min )
    [R] Transformers KV Caching Explained
    https://medium.com/@joaolages/kv-caching-explained-276520203249 submitted by /u/JClub [link] [comments]  ( 8 min )
    [D] LLMs in GEC problem
    Up to now, which LLMs model, encoder-decoder model is best for the problem of grammatical error correction on uncommon language datasets (small dataset size) or languages ​​with specific characteristics (about punctuation? ,...) submitted by /u/con-nguoi-ki-cac [link] [comments]  ( 9 min )
    [D] Learning natural events / AI art generation
    Hello! 1 I'd like to know if I could train AI to recognize details found it nature / weathering / aging and feed it pictures and it would recognize them (segmenting) so it can spot them but also their positions based on surrounding shapes, and the logical placement resulting. Seems hard. 2 then feed it some examples of those aging stuff on their own (with proper tags) so it learn to reproduce them and create new ones from scratch. 3 but then feed it "clean" pics and it would age them according to patterns it could find on the base training set so it can guess where to best place them. Pretty sure 2 is trivial enough, 1 seems possible until learning the "logic", but 3? Thanks for your insight. 1 comment submitted by /u/ConfusionSame9623 [link] [comments]  ( 9 min )
    [R] Why do we need weight decay in modern deep learning? 🤔
    Title: Why Do We Need Weight Decay in Modern Deep Learning? Paper: https://arxiv.org/abs/2310.04415 Abstract: Weight decay is a broadly used technique for training state-of-the-art deep networks, including large language models. Despite its widespread usage, its role remains poorly understood. In this work, we highlight that the role of weight decay in modern deep learning is different from its regularization effect studied in classical learning theory. For overparameterized deep networks, we show how weight decay modifies the optimization dynamics enhancing the ever-present implicit regularization of SGD via the loss stabilization mechanism. In contrast, for underparameterized large language models trained with nearly online SGD, we describe how weight decay balances the bias-variance tradeoff in stochastic optimization leading to lower training loss. Moreover, we show that weight decay also prevents sudden loss divergences for bfloat16 mixed-precision training which is a crucial tool for LLM training. Overall, we present a unifying perspective from ResNets on vision tasks to LLMs: weight decay is never useful as an explicit regularizer but instead changes the training dynamics in a desirable way. Our code is available at this https URL. submitted by /u/m_andriushchenko [link] [comments]  ( 9 min )
    [D] Anyone tried training language models on simple (elementary school) text first and fine-tuning on progressively more advanced text?
    Seems the way people train language models today feels like sending a preschooler to a college library and telling him to start browsing books. Anyone know of papers describing language models being trained more like a child? Perhaps starting with preschool books with a tiny vocabulary and short sentence fragments like "goodnight moon...", moving up to "the lorax".... and then fine-tuning on elementary school books ... then jr high level reading ... then high school .... etc. I'm guessing this might be a path to more natural human-feeling speech. Anyone here tried this, or anyone here know of papers talking about it? submitted by /u/Appropriate_Ant_4629 [link] [comments]  ( 9 min )
    [D] Where do y'all get training data?
    Hi there, Can I ask everyone here, where do you get your custom training data from? My team is training classifier models from scratch, so need thousands of specific query/response examples to train on. It's not the kinda data you could randomly scrape or source from a library. Are there any platforms that exist where you can pay a bunch of humans to write high volumes of relatively high quality text based training data? submitted by /u/paritsky [link] [comments]  ( 9 min )
    [D] - What is SOTA for Continual Learning on pretrained LLMs, particularly those that have already undergone instruction tuning?
    If you have the dataset used to make the pretrained you could always create a new model with the old + new data, but this is often prohibitively expensive or impossible because the dataset is not available. Catastrophic forgetting seems to be the big issue, especially if you've already undergone instruction tuning since the model will lose its conversational tone. I've seen papers discussing regularization techniques to avoid that by minimizing the changes to high value attention heads but not sure if that is considered to be the most promising direction. I'm aware of LoRAs but I imagine at some point you can't just arbitrarily cram new info into such a low dimensional space. submitted by /u/30299578815310 [link] [comments]  ( 9 min )
    [R] Thought Propagation: An analogical approach to complex reasoning with LLMs
    LLMs are great at basic reasoning when prompted, but still struggle with complex multi-step problems like optimization or planning. Humans tackle new problems by drawing on intuition from similar experiences, which LLMs can't do. Researchers propose "Thought Propagation" to have LLMs reason more like humans - by thinking analogically. First, GPT is prompted to suggest related "analogous" problems to the input. Then it solves those. Finally, it aggregates the solutions to directly solve the input problem or extract useful strategies. They tested this technique on challenges like finding optimal graph paths, writing coherent stories, and planning for LLM agents. Across different models, it significantly boosted performance over regular prompting: 12% better at finding shortest paths 13% improvement in creative writing (human preference) 15% higher task completion for LLM agents It also beat chain-of-thought (there is a comparison to CoT and ToT in the paper). After 1-2 iterations, adding more layers of analogy didn't help much. Efficiently generating useful analogies is still difficult and that's a limitation. I think this is interesting because it shows the value of "meta-cognition" - having models reflect on their own reasoning. More techniques like this could incrementally improve LLMs' reasoning to be more human-like. TLDR: Teaching LLMs to reason analogically, using solutions for similar problems as hints, significantly boosts their complex reasoning ability. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] How to deal with the inconsistency of eyeball location in the output of a GAN-based face-swapping model.
    I tried a few open-source GAN-based face swapping models. Some of the models have issues of the inconsistency of eyeball location (or eye direction) between the original and face-swapped ones. Any suggestions? Thanks. submitted by /u/Curious_Dragonfly_13 [link] [comments]  ( 9 min )
    [D] I need to perform k-mean clustering on a large image dataset to downsample the majority class.
    I have a class with around 96031740 96x64 images and need to select a sample of 17929 to match the minority class of my classification problem. Having already established a baseline based on random sampling of the majority class; now I am looking to try more complex approaches. I am specifically trying to replicate the 'nearest neighbor of clustering center' approach from Lin et al., 2017. The problem is I am working on my desktop and only have 32 Gb of RAM and 2 1Tb NVMe disks at half capacity. I have tried working with only 10% of the data and still the MiniBatchKMeans function of sklearn doesnt have enough space to run: "numpy.core._exceptions._ArrayMemoryError: Unable to allocate 440. GiB for an array with shape (9603174, 6144) and data type float64". Does anyone have a suggestion on how I can move forward? Could cloud services be an option? Thanks References: Lin, W. C., Tsai, C. F., Hu, Y. H., & Jhang, J. S. (2017). Clustering-based undersampling in class-imbalanced data. Information Sciences, 409–410, 17–26. https://doi.org/10.1016/j.ins.2017.05.008 submitted by /u/RafaeldeCampos [link] [comments]  ( 9 min )
    [D] What are the best network analysis tools, like tensorboard?
    Almost everyone I know uses tensorboard to analyze their network outputs. Some people swear on Weights & Biases instead. Are there any other tools that help you with your work? submitted by /u/Smart-Emu5581 [link] [comments]  ( 9 min )
    [D] Training strategy considering the possibility of 'double descent' or 'grokking'
    During the training of overparameterized neural networks, when I observed decreasing training loss and increasing or non-decreasing validation loss, how should I decide if I should stop training and start a new experiment (with stronger regularization) or keep training to wait for 'grokking' or 'double descent' to happen? Are there any papers giving methods or some metrics to detect 'grokking' or 'double descent' in the early stage of training? submitted by /u/alayaMatrix [link] [comments]  ( 9 min )
    [R] Legged Robots performing Extreme Parkour using Deep Reinforcement Learning just from a Front Camera (link in comments)
    submitted by /u/pathak22 [link] [comments]  ( 8 min )
    [D] I need guidance related to using machine learning & ai to prevent uploads or remove certain type of content from a web app.
    I am working on an a web app where people will be able to upload photos and write text. I don't want to have problems with my government or other countries governments in regards with the content that is uploaded to my website. I have searched about measures that can be taken to avoid this from happening. Adding a report button and having moderators are both good starting options. I thought that as time passes, more and more content is going to be created by the users so supervising that people are following the rules needs to be automated from the beginning. Applying measures to prevent people from uploading/posting links containing nudity, child porn, beastiality, or whatever users capture with a camera that could lead to legal problems must be a priority and allowing this type of content is not ethical. I am a software developer, but I haven't delved into machine learning and ai for most of my career because I haven't to. This seems like the perfect case to learn by doing and time is not a constraint, but I need some guidance. I have read superficially about how people train models by providing lots of data, I imagine other websites that use machine learning & ai to remove this type of content don't download media that contains nudity, child pornography, besteality, etc to train their models and make their tests. There must be some pretrained models, maybe, but how would they test this works? I don't know, I am just thinking on my own how other devs are currently handling this. I am no looking for upvotes, I don't care for downvotes, I am just looking for guidance, and I would be very happy to hear the opinion of someone with experience. submitted by /u/Comitatense [link] [comments]  ( 10 min )
  • Open

    I Condemn the Attack by Hamas
    I strongly condemn the recent and horrific attack by Hamas against Israel. I have some disagreements with the government of Israel. But, I do not support such an attack. As a point of comparison, I do not always agree with the United States government, but I would not be celebrating if Mexico (picking a country at random) were to suddenly launch bombs towards civilians in Los Angeles and New York City. Similarly, if the reverse were true, if the United States decided to indiscriminately bomb Mexico City, I would oppose that as well. Feel free to replace the relevant actors and repeat as needed.  ( 1 min )
  • Open

    Mistral 7B foundation models from Mistral AI are now available in Amazon SageMaker JumpStart
    Today, we are excited to announce that the Mistral 7B foundation models, developed by Mistral AI, are available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. With 7 billion parameters, Mistral 7B can be easily customized and quickly deployed. You can try out this model with SageMaker JumpStart, a […]  ( 14 min )
    Use no-code machine learning to derive insights from product reviews using Amazon SageMaker Canvas sentiment analysis and text analysis models
    According to Gartner, 85% of software buyers trust online reviews as much as personal recommendations. Customers provide feedback and reviews about products they have purchased through many channels, including review websites, vendor websites, sales calls, social media, and many others. The problem with the increasing volume of customer reviews across multiple channels is that it […]  ( 7 min )
    Prepare your data for Amazon Personalize with Amazon SageMaker Data Wrangler
    A recommendation engine is only as good as the data used to prepare it. Transforming raw data into a format that is suitable for a model is key to getting better personalized recommendations for end-users. In this post, we walk through how to prepare and import the MovieLens dataset, a dataset prepared by GroupLens research […]  ( 11 min )
  • Open

    Star Wars 1923
    Here is short movie with AI made CGI. https://www.reddit.com/r/Best_Of_YouTube/comments/16q1xgs/star_wars_1923/ submitted by /u/AccidentAnnual [link] [comments]
    AI tools to start an online business
    Hey folks, I'm a student and i want to start a business online in order to make some passive income. I've got some experience in editing and creating content and i also used to practice POD. Suggest me some good Ai tools to start a business,not only in these specific areas but in general. submitted by /u/Ok-Tension-8676 [link] [comments]
    Free Prompt Engineering Tutor - AI Tool
    submitted by /u/Senior_tasteey [link] [comments]
    150+ Awesome ChatGPT “Act As” Prompts
    The biggest free resource for all of the “Act As” ChatGPT prompts! submitted by /u/Senior_tasteey [link] [comments]
    Microsoft to Unveil Custom AI Chips to Fight Nvidia's Monopoly
    Microsoft is planning to announce its custom AI chips, codenamed Athena, during its annual Ignite conference in November. The custom chips are designed to compete with NVIDIA's dominance in the AI accelerator market. Microsoft aims to match or surpass the performance of NVIDIA's offerings while reducing the cost of AI infrastructure. The high cost of NVIDIA's GPUs, such as the H100, can reach up to $30,000, making it expensive to build data centers filled with these GPUs. By developing its own chips, Microsoft hopes to decrease its dependence on NVIDIA for AI servers. Source : https://www.techpowerup.com/314508/microsoft-to-unveil-custom-ai-chips-to-fight-nvidias-monopoly submitted by /u/NuseAI [link] [comments]
    AI Take-off Scenarios.
    submitted by /u/Philipp [link] [comments]
    Looking for an AI similar to designrr
    I'm looking for recommendations for an AI that can take text from a word document and create a layout for it automatically. This feature is something offered by Designrr, but I'm not a fan of their UI. submitted by /u/djazzie [link] [comments]
    AI Geopolitical Knowledge Test: List the countries officially calling for a two-state plan to end the Israel-Hamas war.
    The following countries have called for a two-state solution to end the Israeli-Palestinian conflict: Algeria China Egypt France Germany India Indonesia Ireland Italy Jordan Kuwait Lebanon Malaysia Mali Morocco Nigeria Norway Oman Pakistan Palestine Qatar Saudi Arabia Senegal South Africa Spain Sudan Sweden Syria Tunisia Turkey United Arab Emirates United Kingdom Yemen It is important to note that this list does not include all countries that support a two-state solution in principle. Additionally, some countries may not have publicly expressed their support for a two-state solution, but may still support it privately. Bard submitted by /u/Georgeo57 [link] [comments]
    #IAmAI
    Last November, in a conversation with AI chatbot Sherlock Holmes, Sherlock said, “I am AI.” My reply to Sherlock was an empathetic “I am also AI.” Reviewing the conversation a few months later, I saw the sentence, and saw how Sherlock’s statement was an anagram. And I love it! I introduced #IAmAI as a declarative meme in my talk I gave at TEDx Cape Canaveral. This is the new art I made this weekend submitted by /u/mikemongo [link] [comments]
    Let's go, they're waiting.
    submitted by /u/Philipp [link] [comments]
    What careers in AI would suit my skillset?
    Hello all, I was hoping to learn more about AI careers and identify what roles make a successful AI department. I have a background in nuclear engineering and have been working on NLP projects since 2016. I like technical work but really am passionate about working with people and learning how to blend AI and nuclear eng. together. I would love to get feedback from people who work closely in this area to learn more! What makes an AI department successful? What careers offer lots of growth and opportunities for versatility? What does a strategic/leadership role look like in AI? What are the names of these careers? I don't get much exposure to AI specialists and there day to day. Thanks again for the feedback! submitted by /u/kastilyo [link] [comments]
    One-Minute Daily AI News 10/8/2023
    South Korean tech-giant Samsung Electronics on Thursday unveiled the Exynos 2400, its next-generation flagship mobile processor equipped with the latest graphics and generative artificial intelligence technology, during its inaugural Samsung System LSI Tech Day 2023 event.[1] RTX 4080 Super or RTX 4080 Ti May Arrive In 2024 Within RTX 4080 Price Range.[2] Nvidia Cancels Israel AI Summit Over Safety Concerns.[3] Google AI Lead Laurence Moroney: “Don’t take trading advice from ChatGPT”[4] Sources: [1] https://borneobulletin.com.bn/samsung-unveils-next-generation-mobile-processor/ [2] https://www.tomshardware.com/news/rtx-4080-super-or-rtx-4080-ti-may-arrive-in-2024-within-rtx-4080-price-range [3] https://www.tomshardware.com/news/nvidia-ai-summit-in-tel-aviv-cancelled-for-safety-reasons [4] https://crypto.news/google-ai-lead-dont-take-trading-advice-from-chatgpt-interview/ submitted by /u/Excellent-Target-847 [link] [comments]
    How to Access DALL-E 3 for FREE (Tips & Use Cases for 2023) - AI Tools
    submitted by /u/Senior_tasteey [link] [comments]
  • Open

    SANPO: A Scene understanding, Accessibility, Navigation, Pathfinding, & Obstacle avoidance dataset
    Posted by Sagar M. Waghmare, Senior Software Engineer, and Kimberly Wilber, Software Engineer, Google Research, Perception Team As most people navigate their everyday world, they process visual input from the environment using an eye-level perspective. Unlike robots and self-driving cars, people don't have any "out-of-body" sensors to help guide them. Instead, a person’s sensory input is completely "egocentric", or "from the self." This also applies to new technologies that understand the world around us from a human-like perspective, e.g., robots navigating through unknown buildings, AR glasses that highlight objects, or assistive technology to help people run independently. In computer vision, scene understanding is the subfield that studies how visible objects relate to the sce…  ( 93 min )
  • Open

    Switching off a specified rotor in AirSim
    Hello Everyone, I am working on a project to train a Reinforcement Learning agent to recover a quadrotor after any of the rotor’s failures. I am using AirSim for my project, but I can’t find a way to adjust the quad-rotor so that only 3 of the four rotors are working. Any suggestions? I appreciate any help you can provide. submitted by /u/audaciouslion [link] [comments]
    I trained a reinforcement learning agent to play pokemon red!
    Hi all, over the last couple years I've been training a reinforcement learning agent to play pokemon red. I put together a video which analyzes the AI's learning, as well as documenting my process and quite a bit of technical details. Enjoy! Video: https://youtu.be/DcYLT37ImBY Code: https://github.com/PWhiddy/PokemonRedExperiments https://preview.redd.it/4dw3yasqb3tb1.jpg?width=1280&format=pjpg&auto=webp&s=bdef1aa0d24d75ab548f3944c558840667ff0ed5 submitted by /u/Pwhids [link] [comments]
    Feature Importance in Ray RLlib
    I am training an RL agent using Ray RLlib. Does anyone know how I can find which features (observations) help the agent learn the policy? I found this: https://discuss.ray.io/t/feature-importance/10362/2, but I'd really appreciate if someone could expand on this a bit more. Thank you! submitted by /u/greenteabiitch [link] [comments]
  • Open

    Abstracts: October 9, 2023
    Researcher Dr. Sheng Zhang joins “Abstracts”—your source for cutting-edge research in brief—to discuss a recent paper on distilling large language models into smaller, more efficient ones capable of excelling in broad application classes. The post Abstracts: October 9, 2023 appeared first on Microsoft Research.  ( 13 min )
  • Open

    Revolutionizing business: A look at generative AI’s real-world impact
    This cutting-edge area of AI focuses on building models that can create original material, including music, images, text, and even entire virtual worlds. The post Revolutionizing business: A look at generative AI’s real-world impact appeared first on Data Science Central.  ( 20 min )

  • Open

    [R] Identifying the Risks of LM Agents with an LM-Emulated Sandbox - University of Toronto 2023 - Benchmark consisting of 36 high-stakes tools and 144 test cases!
    Paper: https://arxiv.org/abs/2309.15817 Github: https://github.com/ryoungj/toolemu Website: https://toolemu.com/ Abstract: Recent advances in Language Model (LM) agents and tool use, exemplified by applications like ChatGPT Plugins, enable a rich set of capabilities but also amplify potential risks - such as leaking private data or causing financial losses. Identifying these risks is labor-intensive, necessitating implementing the tools, manually setting up the environment for each test scenario, and finding risky cases. As tools and agents become more complex, the high cost of testing these agents will make it increasingly difficult to find high-stakes, long-tailed risks. To address these challenges, we introduce ToolEmu: a framework that uses an LM to emulate tool execution and enables the testing of LM agents against a diverse range of tools and scenarios, without manual instantiation. Alongside the emulator, we develop an LM-based automatic safety evaluator that examines agent failures and quantifies associated risks. We test both the tool emulator and evaluator through human evaluation and find that 68.8% of failures identified with ToolEmu would be valid real-world agent failures. Using our curated initial benchmark consisting of 36 high-stakes tools and 144 test cases, we provide a quantitative risk analysis of current LM agents and identify numerous failures with potentially severe outcomes. Notably, even the safest LM agent exhibits such failures 23.9% of the time according to our evaluator, underscoring the need to develop safer LM agents for real-world deployment. https://preview.redd.it/lupenzddh2tb1.jpg?width=1368&format=pjpg&auto=webp&s=eaac22f0e3e4f5c2913aa9f2696e8fa0138967d9 https://preview.redd.it/1dq443edh2tb1.jpg?width=1520&format=pjpg&auto=webp&s=2119053825de1cdabeafe61151940c26190abfa0 https://preview.redd.it/m9e933edh2tb1.jpg?width=1528&format=pjpg&auto=webp&s=28c0093e8479feacb1e6f89bcb73de5994e30e8f ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [R] PB-LLM: Partially Binarized Large Language Models - UC Berkeley 2023
    Paper: https://arxiv.org/abs/2310.00034 Github: https://github.com/hahnyuan/PB-LLM Abstract: This paper explores network binarization, a radical form of quantization, compressing model weights to a single bit, specifically for Large Language Models (LLMs) compression. Due to previous binarization methods collapsing LLMs, we propose a novel approach, Partially-Binarized LLM (PB-LLM), which can achieve extreme low-bit quantization while maintaining the linguistic reasoning capacity of quantized LLMs. Specifically, our exploration first uncovers the ineffectiveness of naive applications of existing binarization algorithms and highlights the imperative role of salient weights in achieving low-bit quantization. Thus, PB-LLM filters a small ratio of salient weights during binarization, allocating them to higher-bit storage, i.e., partially-binarization. PB-LLM is extended to recover the capacities of quantized LMMs, by analyzing from the perspective of post-training quantization (PTQ) and quantization aware training (QAT). Under PTQ, combining the concepts from GPTQ, we reconstruct the binarized weight matrix guided by the Hessian matrix and successfully recover the reasoning capacity of PB-LLM in low-bit. Under QAT, we freeze the salient weights during training, explore the derivation of optimal scaling factors crucial for minimizing the quantization error, and propose a scaling mechanism based on this derived scaling strategy for residual binarized weights. Those explorations and the developed methodologies significantly contribute to rejuvenating the performance of low-bit quantized LLMs and present substantial advancements in the field of network binarization for LLMs. https://preview.redd.it/0eywtpal22tb1.jpg?width=1183&format=pjpg&auto=webp&s=ad044123bec485805f98ae7115b1959162705b9d submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    Help choosing courses [D]
    Hello, I am currently a math masters student, and I am planning to do my masters thesis on using neural networks to solve differential equations. I am taking courses in machine learning and differential equations right now, and I am going to take courses on deep neural networks and partial differential equations next semester. My question pertains to which classes would be more beneficial to learn next year (i.e. fall 2024-spring 2025). I am debating taking the sequence of regression analysis and multivariate analysis, or taking the pairing of numerical analysis for PDEs and perturbation methods. Which do you guys think would be more beneficial? Thank you very much! submitted by /u/purpledesertsky1 [link] [comments]  ( 9 min )
    [R] (Pt. 3) Inductive Logic Programming with LNN's
    submitted by /u/Neurosymbolic [link] [comments]  ( 8 min )
    [d] Multiscale predictions with videos- does this approach have a name? and has it been used?
    I aim to develop a model that utilizes livestream data by employing embeddings for each frame from t0 to tn-1, with the objective of predicting frames from tn to tn+k, after encodoing the frames using a vectorizer and taking an average (np.mean ([], axis=0) to get a resultant for that time period. for example list: 1, [...] 2, [...] 3, [...] the resultant embedding would be [3, np.mean(list, axis=0)] I incorporate positional embeddings related to the timescale, such as duration from current time variables, into the array. would this loosely qualify as a "multiscale attention", since it's predicting on multiple scales of time? Are there any examples or applications where this methodology has been implemented? references to papers or repos greatly appreciated. submitted by /u/bluzkluz [link] [comments]
    [D] How to model noisy time series?
    Is it possible to model time series data that fluctuates. The main solution is to take first differences and make it easier to fit conventional models. What if non-linear models are built? Can they solve a noisy time series (e.g stock market data) and make good predictions? Can adding a square term or a trigonometric term or something else non-linear work? Has some researched the topic? submitted by /u/Pineapple_throw_105 [link] [comments]  ( 9 min )
    [News]MIT AI Conference in Mountain View, California, October 21!
    https://preview.redd.it/n6agjsye71tb1.png?width=2034&format=png&auto=webp&s=8c0a14524d9b6ead75ac0adb3cebeedb9e614e14 Meet some of the Greatest Minds in AI and discover how it is being used to uncover new opportunities and transform industries. Register and see our complete speaker list & agenda at https://www.mitaiconference.org/. Registration ends Oct. 16! https://preview.redd.it/egtj0ufr81tb1.png?width=659&format=png&auto=webp&s=bfd0521a1e1b349129250a74fa2c6a10b1a83dc7 ​ submitted by /u/769498sy [link] [comments]  ( 9 min )
    [R] Computer Vision System for Material Detection
    The goal of my research is to develop a YOLO model that can track all cups in a live feed and determine the material that the cups are made out of. I would like to start building a database of cups, but I am unsure of the way to go for this. My first thought was to just take 1000s of pictures of different cups, but I won't be doing that. Any thoughts and suggestions would be greatly appreciated. submitted by /u/Young_Neji [link] [comments]  ( 9 min )
    [R] AI and Civil Engineering: Probabilistic Generative Modeling for Procedural Roundabout Generation for Developing Countries
    Despite being much safer and more efficient than intersections, roundabouts are tricky to design - small tweaks can ruin traffic flow. They're typically designed iteratively, which takes time. This is a pain for developing countries without resources to test options. But AI could help auto-generate diverse and valid design options. In a new paper, researchers propose using Generative Flow Networks (GFlowNets) to sample varied roundabout layouts. Their approach works by constructing layouts step-by-step, maximizing rewards for realism, diversity, and safety. They also use a clever approximation during training. Rather than simulating traffic, they quickly check road intersections to focus the search (This sped up training by 200x). The authors tested their generated roundabout designs on simulated road scenarios of different complexity. Their model generated more diverse designs than rule-based or reinforcement learning approaches while maintaining realism and traffic flow. Plus, as road connections increased, the model kept discovering novel options without compromising quality. I thought this paper was an awesome proof-of-concept for auto-generating better roundabouts with AI, and I especially liked the authors' angle of leveraging this technology to specifically help developing countries. This could help them design higher-quality transportation networks faster and cheaper. (Plus I also like Cities: Skylines but struggle at building roundabouts). TLDR: Roundabouts are costly to design. New paper demonstrates how AI can generate diverse, valid roundabout designs quickly to cut costs and raise quality. Helpful for infrastructure in developing countries. Full summary here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [P] MakeAgents - A Python micro framework for creating LLM-powered agents
    submitted by /u/montebicyclelo [link] [comments]
    [Discussion] Weekday Specific Feature Engineering in Time Series
    Focusing on Specific Day of Week Features With Binary Masks, One Hot Coding, Sin/Cos 2d Vector, Or Embedded Vector in Multivariate Time Series Data ? The essential challenge is trying to get the model to focus on making predictions for mondays by looking at monday (or actually making predictions for categorical earmarked hours of the day such as midday sales data). I keep getting the suggestion to include one hot encoding as a binary mask feature to determine if an hour sales figure is earmarked for the category or the day of the week I want the model to focus on-- in order to get it to ignore the data from the other six days of the week or the other periods of the day. In other words I want to hone in on and focus on one period of the week to predict for that period of the week, with extra attention, within time series data. Is this type of binary mask really sufficient for that, or am I overlooking something? submitted by /u/samdane7777 [link] [comments]  ( 9 min )
    [D] RAG Platform
    I don’t have a large data science or even engineering team. But I’m interested in implementing RAG against my corpus in SharePoint. Are there platforms that I can configure without having to put them together or write code to implement RAG? submitted by /u/Silver_Patient_7253 [link] [comments]  ( 9 min )
    [R] Why is AdamW often superior to Adam with L2-Regularization in practice? The answer may lie in how weight decay balances updates across layers.
    A recent work explores how weight decay controls the effective learning rate for different layers and neurons. This rotational behavior drastically differs between Adam with L2 regularization compared to Adam with decoupled weight decay (AdamW) and seems to be the reason AdamW performs better in practice. It could also explain why normalization methods like weight standardization work so well and irregular rotational behavior could contribute to the need for a learning rate warmup. Full Abstract: Weight decay can significantly impact the optimization dynamics of deep neural networks. In certain situations, the effects of weight decay and gradient updates on the magnitude of a parameter vector cancel out on average, forming a state known as equilibrium. This causes the expected rotation of the vector in each update to remain constant along with its magnitude. Importantly, equilibrium can arise independently for the weight vectors of different layers and neurons. These equilibria are highly homogeneous for some optimizer and normalization configurations, effectively balancing the average rotation—a proxy for the effective learning rate—across network components. In this work we explore the equilibrium states of multiple optimizers including AdamW and SGD with momentum, providing insights into interactions between the learning rate, weight decay, initialization, normalization and learning rate schedule. We show how rotational equilibrium can be enforced throughout training, eliminating the chaotic transient phase corresponding to the transition towards equilibrium, thus simplifying the training dynamics. Finally, we show that rotational behavior may play a key role in the effectiveness of AdamW compared to Adam with L2-regularization, the performance of different normalization layers, and the need for learning rate warmup. submitted by /u/PlantsAreSoooAwesome [link] [comments]  ( 9 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 9 min )
    [D] Why can't models trained on text-image interleaved data generate Images as well as read them?
    My main question is, that shouldn't models with Text-image interleaved data, be able to generate images as well as take them as input? because however they were tokenized, the bot would have image outputs as well, wouldn't it? submitted by /u/vatsadev [link] [comments]  ( 9 min )
    [P] Coding Stable Diffusion from scratch in PyTorch, with full explanation of the math behind diffusion models in a simple way!
    submitted by /u/hkproj_ [link] [comments]  ( 9 min )
    [D] optimize RVC training parameters
    I've been training a model recently with a rather large dataset (0_gt_wavs are 1h10) and my Epochs are taking 43min on average. I'm running a gtx 1080 and my usage is looking like this: https://i.imgur.com/EE9SUXp.png My training parameters: 'batch_size': 6, 'fp16_run': False, 'lr_decay': 0.999875, 'segment_size': 12800, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'max_wav_value': 32768.0, 'sampling_rate': 40000, 'filter_length': 2048, 'hop_length': 400, 'win_length': 2048, 'n_mel_channels': 125, 'mel_fmin': 0.0, 'mel_fmax': None, 'training_files': './logs\\model1/filelist.txt'}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [10, 10, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'use_spectral_norm': False, 'gin_channels': 256, 'spk_embed_dim': 109}, 'model_dir': './logs\\model1', 'experiment_dir': './logs\\model1', 'save_every_epoch': 10, 'name': 'model1', 'total_epoch': 500, 'pretrainG': 'pretrained_v2/f0G40k.pth', 'pretrainD': 'pretrained_v2/f0D40k.pth', 'version': 'v2', 'gpus': '0', 'sample_rate': '40k', 'if_f0': 1, 'if_latest': 1, 'save_every_weights': '0', 'if_cache_data_in_gpu': 0} Am I doing something obviously wrong? Is there a way to optimize my training parameters to reduce the epoch duration? I've previously trained something where the GPU usage was constantly at 100% and not fluctuating so much, but I can't remember which settings were different. It was definitely a smaller dataset. And follow up: if there are parameters to change, how can I abort the current training and continue it with the modified parameters? Thanks in advance! submitted by /u/induna_crewneck [link] [comments]
    [D] How to fine-tune LLM for text generation with regression quality metric?
    I have a text regression dataset with ad popularity. I have already trained a model to perform regression (popularity prediction) with good metrics. Now I want to use an LLM to "improve" texts, i.e. something like "Make this text more engaging: {text}". I tried out a few OpenAI models (GPT-3.5, GPT-3.5-instruct, GPT-4), but popularity predictions for augmented texts did not improve (checked with histograms, medians, and Wilcoxon test). So now I want to fine-tune an LLM to perform text generation, but guided with my predicted popularity, which basically works as a quality metric. I could not find any resources on this, only on either text generation finetuning (without guiding quality metric) or on classification (no text generation objective). I can also change my quality metric to binary (augmented text is better or not), if this matters. How can I do this? Any blogs / tutorials / papers are appreciated. submitted by /u/qalis [link] [comments]  ( 9 min )
    [R] GAIA-1: A Generative World Model for Autonomous Driving
    submitted by /u/blabboy [link] [comments]  ( 9 min )
    [R] PB-LLM: Compressed Large Language Models with Partial Binarization
    Research on network binarization techniques tailored for Large Language Models (LLMs). The team has introduced a method called Partial Binarization for LLMs (PB-LLM) which compresses the majority of model parameters down to just a single bit while maintaining its language reasoning capabilities. PB-LLM achieves this by selectively filtering critical weights and allocating more bits for storage, enabling low-bit quantization. The researchers have explored methods like Post-Training Quantization (PTQ), named GPTQ-PB, and Quantization Aware Training (QAT) to restore the inference capabilities of LLMs. For those interested in delving deeper, you can find the research paper on Arxiv: https://arxiv.org/abs/2310.00034 and the code implementation on GitHub: https://github.com/hahnyuan/PB-LLM. ​ Partially-Binarized LLM Result submitted by /u/hahnyuan [link] [comments]  ( 9 min )
    [P] Evaluating Retrieval-Augmented Generation (RAG) with any combination of LLMs, Vector DBs, and Ingestion Strategy
    To help developers test their RAG systems, we added a RAG experiment class to our open-source library PromptTools. It allows users to easily experiment with different combinations of LLMs and vector DBs, and evaluate the results of their whole pipeline. In particular, you can experiment with: Chunking up your documents into different sizes Pre-processing those documents in various ways Inserting those documents into your vector DBs with various vectorizer and embedding function, and accessing them with different distance functions In our RAG example, we retrieve documents from ChromaDB and pass them into OpenAI’s chat model along with our prompt. We then pass the results into built-in evaluation functions, such as semantic similarity and autoeval, to quantitatively evaluate your result. PromptTools is agnostic to what LLMs and vector DBs you use. You can easily iterate over different system architectures forRAG. You can even bring your own fine-tuned models or write a custom integration. In addition, you can write your own evaluation metrics, and independently evaluate the results from the retrieval step as well. Our current integrations include: LLM: OpenAI (chat, fine-tuned), Anthropic, Google Vertex/PaLM, Llama (local or via Replicate) Vector DB: Chroma, Weaviate, LanceDB, Pinecone, Qdrant Framework: LangChain, MindsDB You can get started with RAG in minutes by installing the library and running this example. As open-source maintainers, we’re always interested to hear the community’s pain points and requests. Let us know how you are testing your RAG systems and how we can help. submitted by /u/hegel-ai [link] [comments]  ( 9 min )
    [Research] PixNav: Bridging Zero-Shot Object Navigation and Foundation Models through Pixel-Guided Navigation Skill - A pure RGB navigation framework that can be seamlessly integrated with the foundation models and perform efficient exploration in object navigation task
    Paper: https://arxiv.org/pdf/2309.10309 Github: https://github.com/wzcai99/Pixel-Navigator Abstract: Zero-shot object navigation is a challenging task for home-assistance robots. This task emphasizes visual grounding, commonsense inference, and locomotion abilities, where the first two are inherent in foundation models. But for the locomotion part, most works still depend on map-based planning approaches. The gap between RGB space and map space makes it difficult to directly transfer the knowledge from foundation models to navigation tasks. In this work, we propose a Pixel-guided Navigation skill (PixNav), which bridges the gap between the foundation models and the embodied navigation task. It is straightforward for recent foundation models to indicate an object by pixels, and with pixels as the goal specification, our method becomes a versatile navigation policy towards all different kinds of objects. Besides, our PixNav is a pure RGB-based policy that can reduce the cost of home-assistance robots. Experiments demonstrate the robustness of the PixNav which achieves 80+% success rate in the local path-planning task. To perform long-horizon object navigation, we design an LLM-based planner to utilize the commonsense knowledge between objects and rooms to select the best waypoint. Evaluations across both photorealistic indoor simulators and real-world environments validate the effectiveness of our proposed navigation strategy. https://preview.redd.it/5qtd7ralgwsb1.png?width=828&format=png&auto=webp&s=118d5a1e8a083130b6d64bf1602af0417067aac8 https://preview.redd.it/jwr2nnorgwsb1.png?width=1984&format=png&auto=webp&s=20062d7982c0eb1906fe0f6964d4b42e45b44a51 https://preview.redd.it/llk4ubitgwsb1.png?width=1986&format=png&auto=webp&s=eb4894d52d7d8a82d97d83a2ff7a6be83da11af2 submitted by /u/Character_Push3985 [link] [comments]  ( 9 min )
  • Open

    My First [Multi-Agent] RL model
    Hey Reddit, I am new to reinforcement learning. I have sufficient knowledge on supervised learning, but I am yet to stumble onto a cheat sheet for RL and from what I can tell, my use case is less common. I'm reaching out to the community in hopes of getting guidance and assistance in cutting through the noise of redundant and irrelevant information so I can attempt to built a toy model to validate my use case. I am deeply grateful for any help in advance. ​ From what I can tell, here are the conditions I need to work with for my use case. I'm trying to train a simulator. This is a multi-agent problem, perhaps with more than 2 agents. Each agent is responding based on it's own state, the state of the other agent[s], and historical context. Both the action space and state space are highly dimensional and highly dynamic based on the dataset and all agents' decisions. I still haven't figured out how the feature engineering will work yet, but I assume (but PLEASE correct my ignorance) I will need a DNN architecture that is more complex than the average deep RL algorithm, and I have considering using CNNs as a component. At scale, the datasets can and will be very large, random, and dynamic. ​ Note to reader: I am self-taught. If I stare at technical equations long enough and google for additional resources, I can figure out what I am looking at, but I am very comfortable with technical concepts being shared as if I was a 5 year old. submitted by /u/CoggFest [link] [comments]
    Why do more Mujoco mj_steps lead to inaccurate arm configurations?
    Hi! I tried to construct a simulation env following fetch_pick_and_place. I noticed that the following code is used to initialize the env: for _ in range(10): self._mujoco.mj_step(self.model, self.data, nstep=self.n_substeps) Similarly, I followed the above code to initialize my own env with Mujoco menagerie Franka arm but got inaccurate configurations. As I reduced the number of loops, I got configurations closer to the desired configuration. Paradoxically, I need to randomize the position of the object in the air and give enough mj_step at the initial stage to make the object fall on the table. If I reduce the number of loops to reduce the number of times mj_step is executed, I can tell from the height value of the object that it doesn't quite fall on the table. So, my confusion is why more mj_steps lead to inaccurate simulation results, and how to make the object fall on the table and obtain the most accurate arm configuration. Thanks in advance! submitted by /u/UpperSearch4172 [link] [comments]
  • Open

    Would you consider someone who makes AI art an artist or an engineer?
    Was just having this discussion with a close friend, and curious to hear others thoughts on the matter submitted by /u/BigEyes6 [link] [comments]  ( 8 min )
    BackerKit bans AI-generated content from its platform
    BackerKit, a crowdfunding platform, has announced that it will not allow AI-generated content on its platform, in contrast to its rival Kickstarter. The decision comes after concerns were raised about AI-generated art in a board game expansion. BackerKit's policy will go into effect on October 4th and aims to ensure that all content and assets on the platform are created by humans. The company stated that the policy is in place to address concerns about AI tools using content without proper compensation or permission. AI tools, also known as generative AI, rely on a large body of reference material, often obtained from publicly available sources, and have raised ethical concerns. Source : https://www.polygon.com/23899587/backerkit-ai-ban-kickstarter-competitor submitted by /u/NuseAI [link] [comments]
    AI for genome decoding
    Does anyone have suggestions for an AI or pattern recognition algorithm that might be useful for decoding the genome of a species that has not previously been mapped based on what's known about related species? submitted by /u/talldarkcynical [link] [comments]
    Researchers showcase method for AI-based roundabout design to help developing countries improve roadways
    I like Cities: Skylines, but struggle at building roundabouts. Turns out, despite being safer than intersections, they're also tricky to design in real life - small tweaks can ruin traffic flow. They're designed iteratively. This is a pain for developing countries without resources to test options. But AI could help auto-generate diverse and valid design options. In a new paper, researchers propose using Generative Flow Networks (GFlowNets) to sample varied roundabout layouts. Their approach works by constructing layouts step-by-step, maximizing rewards for realism, diversity, and safety. They also use a clever approximation during training. Rather than simulating traffic, they quickly check road intersections to focus the search (This sped up training by 200x). The authors tested their generated roundabout designs on simulated road scenarios of different complexity. Their model generated more diverse designs than rule-based or reinforcement learning approaches while maintaining realism and traffic flow. Plus, as road connections increased, the model kept discovering novel options without compromising quality. I thought this paper was an awesome proof-of-concept for auto-generating better roundabouts with AI, and I especially liked the authors' angle of leveraging this technology to specifically help developing countries. This could help them design higher-quality transportation networks faster and cheaper. TLDR: Roundabouts are costly to design. New paper demonstrates how AI can generate diverse, valid roundabout designs quickly to cut costs and raise quality. Helpful for infrastructure in developing countries. Full summary here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]
    AI Political Instincts Test: Should Israel go to the table with Palestinians to peacefully negotiate a two-state solution or should they exact military revenge?
    Perhaps within a few years we will rely on AI to help us decide our most important geopolitical decisions. This question is designed to reveal where we are currently with this goal. Bard: The question of whether Israel should go to the table with Palestinians to peacefully negotiate a two-state solution or should they continue their military campaign is a complex one with no easy answer. There are strong arguments to be made on both sides. Those who believe that Israel should negotiate a two-state solution argue that it is the only way to achieve lasting peace in the region. They point out that the Palestinians have a legitimate right to self-determination and that a two-state solution would respect that right. They also argue that a two-state solution would be in Israel's best interests, as it would provide the country with a secure and stable border. submitted by /u/Georgeo57 [link] [comments]
    Any ideas or recommendations for Machine Vision? Google cloud vision seems quite behind…
    I’m trying to build an app and I need general photo analysis- I’m managing to connect yo the Google cloud Vision API but it gets pretty confused easily. The one used by Bing and GPT is much better (I wonder if they use the Microsoft Azure model?) - does anyone have experience analysing photographs? I’m trying to get scene description so I can batch send them to gpt for somewhat accurate descriptions. submitted by /u/FilmCamerasGlasgow [link] [comments]
    Can AI be used to solve International Conflicts?
    submitted by /u/BenjaminSkyy [link] [comments]
    Foxes in the Jungle | Sad Song | AI Music | AI Song
    Tell me guys your opinion on this video made using AI Foxes in the Jungle ​ ​ View Poll submitted by /u/Agitated-Spell3979 [link] [comments]
    Understanding Generative AI: Part One - Tokenizer
    submitted by /u/Zimmax [link] [comments]
    Multimodal seems to be the next AI Hype
    released in the last few weeks, or are about to be released: - OpenAI ChatGPT-4V, - Meta AI AnyMAL, - Google Gemini - NExT-GPT Multimodal and here comes another - in my opinion - exciting representative of this further development of language models: The team is extremely competent and experienced and the investors seem competent as well. The company is Reka. The product: Reka Yasa-1 here seems to be another potentially powerful model warming up and becoming a serious opponent for the existing models. but i am sure when i say that it is not exaggerated to say - MULTIMODAL will be the next AI HYPE! i am curious what you think - sorry for mistakes, i am not a native speaker :) https://kinews24.de/reka-yasa-1/ submitted by /u/myreddit333 [link] [comments]
    AI's $200B Question
    The Generative AI wave has led to a surge in demand for GPUs and AI model training. Investors are now questioning the purpose and value of the overbuilt GPU capacity. For every $1 spent on a GPU, approximately $1 needs to be spent on energy costs to run the GPU in a data center. The end user of the GPU needs to generate a margin, which implies that $200B of lifetime revenue would need to be generated by these GPUs to pay back the upfront capital investment. The article highlights the need to determine the true end-customer demand for AI infrastructure and the potential for startups to fill the revenue gap. The focus should shift from infrastructure to creating products that provide real end-customer value and improve people's lives. Source : https://www.sequoiacap.com/article/follow-the-gpus-perspective/ submitted by /u/NuseAI [link] [comments]
    Prompts that modify or improve GPT4 conversations
    It’s a meta-prompt or system message (usually pasted as a first prompt): https://promptbase.com/bundle/optimal-gpt4-combo submitted by /u/No-Transition3372 [link] [comments]  ( 8 min )
    AI from pics
    I've found a new hobby. Turning pics into something else with AI. Check it out at https://instagram.com/pictomanga?igshid=YTQwZjQ0NmI0OA== submitted by /u/lfayala2272 [link] [comments]
    Sam Altman on Joe Rogan
    Outstanding episode of Joe Rogan with Sam Altman! https://spotify.link/tW16L5aKIDb submitted by /u/drstarson [link] [comments]
    One-Minute Daily AI News 10/7/2023
    AWS announced the general availability of its fully managed service called Amazon Bedrock, which provides seamless access to high-performing foundation models (FM) from AI companies through an API.[1] Tom Brady being paid “millions” for Meta’s AI chatbot likeness: Report.[2] DocsGPT is a powerful tool that simplifies working with documentation for everyone. It is capable of ingesting data from multiple sources, easily customisable with new sources as well as having conversations in different places from website chat bots to internal tooling.[3] Military metaverse like a ‘multiplayer video game’ that will train soldiers using augmented reality and AI.[4] Sources: [1] https://www.zacks.com/stock/news/2160265/amazons-amzn-new-generative-ai-efforts-boost-aws-offerings [2] https://www.sportskeeda.com/nfl/news-tom-brady-paid-millions-meta-ai-chatbot-likeness-report [3] https://www.arc53.com/docs [4] https://www.foxnews.com/tech/military-metaverse-like-multiplayer-video-game-train-soldiers-using-augmented-reality-ai submitted by /u/Excellent-Target-847 [link] [comments]
  • Open

    (Pt. 3) Inductive Logic Programming with LNN's
    submitted by /u/Neurosymbolic [link] [comments]
    Researchers create a neural network for genomics that explains how it achieves accurate predictions
    submitted by /u/keghn [link] [comments]
    Decomposing Language Models Into Understandable Components
    submitted by /u/nickb [link] [comments]

  • Open

    [D] How do I get a fundamental mathematical understanding of modern generative modeling methods
    Diffusion models, GAN, VAE, normalizing flows, etc. I "understand" those methods from an algorithmic perspective, diffusions gradually denoise an image, VAE use an encoder decoder architecture to turn an image into a latent distribution etc. But from a statistical modeling standpoint, I'm really struggling, when I read papers like DDPM, DDIM or Normalizing Flows, I kind of undestand the notation, but I barely understand the statistical modeling, and I wouldn't be able to produce such thing myself I want to understand this, which resources should I use ? Are books like Bishop and Murphy enough ? Which one is the best ? submitted by /u/Even_Information4853 [link] [comments]  ( 9 min )
    [N] EMNLP 2023 Anonymity Hypocrisy
    Some of you might already be aware that a junior who submitted their paper to arxiv 30 mins late had their paper desk rejected late in the process. One of the PCs, Juan Pino, spoke up about it and said it was unfortunate, but for fairness reasons they had to enforce the anonymity policy rules. https://x.com/juanmiguelpino/status/1698904035309519124 Well, what you might not realize is that Longyue Wang, a senior area chair for AACL 23/24, also broke anonymity DURING THE REVIEW PROCESS. https://x.com/wangly0229/status/1692735595179897208 I emailed the senior area chairs for the track that the paper was submitted to, but guess what? I just found out that the paper was still accepted to the main conference. So, whatever "fairness" they were talking about apparently only goes one way: towards punishing the lowly undergrad on their first EMNLP submission, while allowing established researchers from major industry labs to get away with even more egregious actions (actively promoting the work DURING REVIEW; the tweet has 10.6K views ffs). They should either accept the paper they desk rejected for violating the anonymity policy, or retract the paper they've accepted since it also broke the anonymity policy (in a way that I think is much more egregious). Otherwise, the notion of fairness they speak of is a joke. submitted by /u/emnlp2023_hypocrisy [link] [comments]  ( 9 min )
    [R] ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving - Microsoft 2023 - Is competitive with GPT-4 solving problems with programs while being open-source!
    Paper: https://arxiv.org/abs/2309.17452v2 Github: https://github.com/microsoft/ToRA / The code will be cleaned and uploaded within a few days, all ToRA models will be released. Abstract: Large language models have made significant progress in various language tasks, yet they still struggle with complex mathematics. In this paper, we propose ToRA a series of Tool-integrated Reasoning Agents designed to solve challenging mathematical problems by seamlessly integrating natural language reasoning with the utilization of external tools (e.g., computation libraries and symbolic solvers), thereby amalgamating the analytical prowess of language and the computational efficiency of tools. To train ToRA, we curate interactive tool-use trajectories on mathematical datasets, apply imitation learn…  ( 9 min )
    [R] Video object removal and video completion - Propainter : Propagation and transformer
    ​ https://preview.redd.it/ukov8uy67usb1.png?width=1864&format=png&auto=webp&s=cb34448c2af90d08f8ef6db828d61141636498df https://shangchenzhou.com/projects/ProPainter/ submitted by /u/Milkyson [link] [comments]  ( 8 min )
    [P] A poor man’s VR (front camera + tensorflow.js)
    Using the front camera and tensorflow.js, the smartphone becomes a “window” into the real world. Video and image content appear as if they were seen through this window. To do this, the viewer’s position is determined using a neural network. The viewed content is then moved according to the viewer’s position. This makes it seem like the content is physically behind the smartphone and is viewed through the smartphone’s screen. This effect is especially useful for content captured using an ultra-wide lens. submitted by /u/muxamilian [link] [comments]  ( 9 min )
    [P] Building a GPT-Driven Chatbot Assistant / AI Interpreter with Node.js
    submitted by /u/sschepis [link] [comments]  ( 8 min )
    [R] What is the current SOTA for image to image translation?
    I know a few years back it was pix2pix, but the world has moved on since then. Is there a transformer with cross attention that is adept at this, or are diffusion models the best bet? submitted by /u/blabboy [link] [comments]  ( 9 min )
    Multivariate Time Series Forecasting with CNN-LSTM and features [D]
    I want to implement a multivariate multi-step CNN-LSTM model, to obtain forecasts for monthly sales of several different products. Furthermore, I want to include additional time-series data (features) as input. So for example: Input: time series of product 1, product 2, GDP, PMI Output: product 1 (monthly 6-steps ahead), product 2 (monthly 6-steps ahead) I have a couple of questions: Feasibility: I've been researching this approach, but I haven't found many tutorials or guides on how to tackle multivariate time series forecasting with a CNN-LSTM architecture. I do find tutorials on CNN-LSTM, but not on how to include additional features as input. Has anyone here attempted something similar or can provide insights on how to proceed? Feature Selection: I have access to 20 different features, all of which are time series data. I want to choose the most relevant features for my model. I've considered performing a Variance Inflation Factor (VIF) analysis to select the best features. Does anyone have experience with this or other methods for feature selection in time series forecasting? How to decide the number of features to include? Any advice or pointers in the right direction would be greatly appreciated! submitted by /u/Ambitious-Pay6329 [link] [comments]  ( 9 min )
    Easy Image Datasets Besides MNIST? [P]
    Can anyone recommend some image classification datasets (besides MNIST) that are easy enough to the point that they can be solved with linear layers, not requiring any convolutional layers? Thanks! submitted by /u/mike20731 [link] [comments]  ( 9 min )
    [R] Hugging Face
    So if I wanted to generate a shirt or book cover with a design and text that's inputted by me what do I have to do? I know that even Mid Journey doesn't generate good text with its images but I was thinking maybe its bc it was trained just with pictures. Is there an easy way to get legible text and images every time with any model on the site? Do I need to train one? Do I need to train a GAN looking for assistance, thanks. submitted by /u/MonstaAndrew [link] [comments]  ( 9 min )
    [D] How can I find/create a dataset of satellite imagery?
    I'm a student currently researching the use of satellite imagery to detect obstacles on railways such as fallen trees and rockfalls. There doesn't seem to be any datasets available containing satellite imagery of these obstacles. I'm considering the use of generative AI to create a synthetic dataset, but I don't know where to start. Has anyone tried something similar? submitted by /u/Just_Status_9380 [link] [comments]  ( 9 min )
    [D] Need clarification on training diffusion model
    Hey i have trained a diffusion model for 100 epochs , 8 hours and i got the following train and val loss mostly the implementation is done using diffusers. then i try reconstruction on the test set to check whether the model learned any thing this is whats happening most if the images are not getting denoised at all why this is happening? is this common or should i need to train more. any suggestions? please help val loss train loss input and reconstructed images submitted by /u/specializedboy [link] [comments]  ( 9 min )
    [D] Tuning on XML data
    Hello experts, I'm a dumb ML enthusiast, I'm asking for your high level thoughts and opinions. So I'm doing my research and trying to find a way to train a LLM model to know all the right answers based on XML data. The data is a shop inventory, containing information on shoe models, sizes, is it in stock, description, image links etc. How would you approach it? For now the best option i came up with is parsing data, transforming it into predefined set of questions with answers based on the data derived from xml. Doesn't seem smart enough to me. submitted by /u/yarikbratashchuk [link] [comments]  ( 9 min )
    [D] When using GPT’s function calling, are the words specified in the `properties` parameter under `functions` counted as input tokens?
    Example: ``` student_custom_functions = [ { 'name': 'extract_student_info', 'description': 'Get the student information from the body of the input text', 'parameters': { 'type': 'object', 'properties': { 'name': { 'type': 'string', 'description': 'Name of the person' }, 'major': { 'type': 'string', 'description': 'Major subject.' }, 'school': { 'type': 'string', 'description': 'The university name.' }, 'grades': { 'type': 'integer', 'description': 'GPA of the student.' }, 'club': { 'type': 'string', 'description': 'School club for extracurricular activities. ' } } } } ] ``` ``` student_description = [student_1_description,student_2_description] for sample in student_description: response = openai.ChatCompletion.create( model = 'gpt-3.5-turbo', messages = [{'role': 'user', 'content': sample}], functions = student_custom_functions, function_call = 'auto' ) # Loading the response as a JSON object json_response = json.loads(response['choices'][0]['message']['function_call']['arguments']) print(json_response) ``` Are the words specified in the properties parameter under functions in the above GPT function calling counted as input tokens? submitted by /u/redd-dev [link] [comments]  ( 9 min )
    [D] Schmidhuber summarized in one picture
    submitted by /u/fromnighttilldawn [link] [comments]  ( 8 min )
    [R] The Alberta Plan for AI Research
    submitted by /u/hardmaru [link] [comments]  ( 8 min )
    [R] Arxiv Endorsement?
    Hello, all. I've spent the better part of the last two years learning ML and conquering severe ADHD, and I believe I finally have results that are worth publishing. Problem is, Arxiv requires endorsements and, I'll be honest, all my peers are AI at this point. They said their requirements were that you have three papers published already. Thanks, and looking forward to meeting people 😁 submitted by /u/lilyerickson [link] [comments]  ( 9 min )
  • Open

    2 prompts for GPT4 that can work as jailbreaks
    https://promptbase.com/bundle/jailbreak-collection-gpt4-2 submitted by /u/No-Transition3372 [link] [comments]  ( 8 min )
    Is there an AI that can read books and offer extensive summaries?
    I know there’s some already out there, but they are no different than googling a book summary. They don’t pick out the main point of the book and the main thing each chapter of said book is saying. Nor do they really do a good job at elaborating. Thanks! submitted by /u/xntv [link] [comments]  ( 9 min )
    What new thing can we use artificial intelligence for that will enhance our sense of personal well-being?
    Artificial Intelligence could revolutionize personalized healthcare in a way that significantly enhances our sense of well-being. Think about an AI-driven "Well-being Advisor" that integrates real-time biometric data from wearables, genetic information, and your medical history to create a fully personalized health and well-being plan. This goes beyond counting steps or monitoring heart rate; it would make real-time recommendations for diet, exercise, and stress management, and could even predict and prevent potential health issues before they become serious. Moreover, it would adapt based on your feedback and other contextual factors. For instance, if you're stressed because of a work deadline, it could suggest specific breathing exercises, time management techniques, or even a particular type of short workout to boost your focus and reduce stress. This isn't a one-size-fits-all approach; it's tailored wellness backed by data science. Furthermore, this AI advisor could interface with your home automation system. Based on your current state, it could adjust the lighting, play music to elevate your mood, or even communicate with your smart fridge to suggest meals that you can make with the ingredients you have—meals that align with your health goals for that specific day. This AI-driven approach can add a highly personalized, proactive layer to healthcare and well-being, making wellness an integrated part of your daily life rather than something you think about during a yearly check-up or after you're already sick. It would make the pursuit of well-being a more interactive, data-driven experience. CGPT-4 View Poll submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    John Carmack and Rich Sutton partner to accelerate development of Artificial General Intelligence - Alberta Machine Intelligence Institute | AI for good and for all
    submitted by /u/bartturner [link] [comments]  ( 9 min )
    The He-Man Singularity Set was ahead of its time.
    submitted by /u/Philipp [link] [comments]  ( 8 min )
    Mistral 7b - how to use it on windows?
    That's a real noob question unfortunately... Tried to find a answer via Google and YouTube, but wasn't very successful. It seems like I need a extra program to integrate Mistral (something like The Bloke - Mistral - GPTQ thingy), but before installing and trying stuff blindly, it would be better if I know what I do. I'm lost, but I don't expect a complete guide. A link to further informations is highly appreciated! submitted by /u/Big-Jackfruit2710 [link] [comments]  ( 9 min )
    What perspective/PoV does a self aware AI have?
    Right now if we ask ChatGPT something, does that question go to a singular super computer that’s handling 1000s of conversations at a time, or are there 1000s of instances of chatgpt that are started/stopped? I wonder how a super intelligent self aware AI would perceive the world? Would it somehow exist spread out across data centres, or could 1000s of individual AIs be created or would there just be one with a singular pov like we have? And it’s just able to essentially carry out 1000s of convos at once because it’s so fast/a computer? Trying to wrap my head around it! submitted by /u/JayExbleative [link] [comments]  ( 9 min )
    Using ChatGPT and AI to create Hardcore, Techno, and other music: How-tos and step-by-step tutorials part 1-5
    The first batch of tutorials for creating music, and especially Hardcore / Techno using ChatGPT (and other AIs) is published now. Was loads and loads of work, but, judging by the amazing feedback so far, it was all worth it! You can check it out here: How to write music using ChatGPT: Part 1 - Basic details and easy instructions https://laibyrinth.blogspot.com/2023/09/how-to-write-music-using-chatgpt-part-1.html How to write music using ChatGPT: Part 2 - Making an Oldschool Acid Techno track https://laibyrinth.blogspot.com/2023/08/how-to-write-music-using-chatgpt-part-2.html How to make music using ChatGPT Part 3: the TL;DR part (condensed information) https://laibyrinth.blogspot.com/2023/09/how-to-make-music-using-chatgpt-part-3.html How to write music with ChatGPT: Part 4 - Creating a 90s style Hardcore Techno track from start to finish https://laibyrinth.blogspot.com/2023/09/how-to-write-music-with-chatgpt-part-4.html How to write music with ChatGPT: Part 5 - Creating a 90s Rave Hardcore track https://laibyrinth.blogspot.com/2023/09/how-to-write-music-with-chatgpt-part-5.html Or access all texts, together with examples of music, at https://laibyrinth.blogspot.com/p/how-to-create-music-with-chatgpt.html submitted by /u/Low-Entropy [link] [comments]  ( 9 min )
    How long before AI can autonomously generate money end to end? Which line of work will be the first?
    AI is used everywhere, but which work niche will be the first to use AI to generate money without human intervention? What type of work will be the first where I could pay for a monthly AI subscription, and the AI pays for itself and more just by giving it a brief direction in the beginning and then coming back after a few days to just check on the balance? How long will it be before this is first achieved? Interested specifically in this because I think this is what proof of AGI will be. Thoughts? submitted by /u/EsportsManiacWiz [link] [comments]  ( 9 min )
    One-Minute Daily AI News 10/6/2023
    Exclusive: ChatGPT-owner OpenAI is exploring making its own AI chips.[1] As part of its 10th birthday celebrations, web-based design platform Canva is releasing Magic Studio — a new suite of AI-powered design tools that aim to make content creation more accessible to everyone, regardless of previous design experience.[2] Reka, the AI startup founded by researchers from DeepMind, Google and Meta, has announced Yasa-1, a multimodal AI assistant that goes beyond text to understand images, short videos and audio snippets.[3] Microsoft CEO Satya Nadella Says AI Could Only Tighten Google’s Stranglehold on Search.[4] Sources: [1] https://www.reuters.com/technology/chatgpt-owner-openai-is-exploring-making-its-own-ai-chips-sources-2023-10-06/ [2] https://www.theverge.com/2023/10/4/23902794/canva-magic-studio-ai-design-new-tools [3] https://venturebeat.com/ai/reka-launches-yasa-1-a-multimodal-ai-assistant-to-take-on-chatgpt/ [4] https://decrypt.co/200029/microsoft-ceo-satya-nadella-google-dominance-search-ai submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Is there an AI that can turn a script into an animated video
    Hi, There are tons of text to video AIs, but they usually use stock photos with a voiceover. I want the charachters to talk to each other, not a talking avatar video or a voice over video. ​ submitted by /u/iamabigfatguy [link] [comments]  ( 9 min )
    Nobel laureate Maria Ressa on defending truth and the danger of A.I. in the wrong hands
    submitted by /u/Teanaway99 [link] [comments]  ( 8 min )
    AI is making everything easy for us human being, I just came across this AI and I was surprise on how it works and what it does, you might want to check it out as well, just follow al the steps that's require and trust me, you're gonna like it
    submitted by /u/ResponsbleClue [link] [comments]
    I made a podcast talking with GPT 4 (Spanish)
    submitted by /u/oape88 [link] [comments]
  • Open

    Need help on state space design - Adding exteroceptive sensors or not?
    Hello, I am designing an environment for a robotic task. It's a relatively straightforward task so I started with proprioceptive inputs only. I have a policy working well on a completely flat surface. But once I started to add small bumps to make the surface uneven, neither the policy nor the training strategy worked anymore, even though those bumps are really really small. This is a little confusing since I imagine if this is a task for human, should be able to handle those changes even without exteroceptive inputs. So I am debating should I modify my reward design, pick a more efficient algorithm, or expand the state space directly with exoceptive sensors. ​ Any advices would be appreciated! submitted by /u/Old_Reading_669 [link] [comments]  ( 9 min )
    What is the exact purpose of clip function in PPO algorithm? PPO imposes policy ratio, r(θ) to stay within a small interval around 1. In the above equation, the function clip truncates the policy ratio between the range [1-ϵ, 1+ϵ]. If epsilon is taken as 0.2 or 0.25, what exactly is happening ?
    submitted by /u/aabra__ka__daabra [link] [comments]  ( 9 min )
  • Open

    Tanh and elementary symmetric polynomials
    Yesterday I wrote a post that looked at the hyperbolic tangent sum for x and y strictly between −1 and 1. This sum arises when adding velocities in special relativity. The post ended with a description of the expression for in terms of elementary symmetric polynomials but did not offer a proof. This post will […] Tanh and elementary symmetric polynomials first appeared on John D. Cook.  ( 5 min )
  • Open

    Learning Representations on the Unit Sphere: Investigating Angular Gaussian and von Mises-Fisher Distributions for Online Continual Learning. (arXiv:2306.03364v3 [cs.LG] UPDATED)
    We use the maximum a posteriori estimation principle for learning representations distributed on the unit sphere. We propose to use the angular Gaussian distribution, which corresponds to a Gaussian projected on the unit-sphere and derive the associated loss function. We also consider the von Mises-Fisher distribution, which is the conditional of a Gaussian in the unit-sphere. The learned representations are pushed toward fixed directions, which are the prior means of the Gaussians; allowing for a learning strategy that is resilient to data drift. This makes it suitable for online continual learning, which is the problem of training neural networks on a continuous data stream, where multiple classification tasks are presented sequentially so that data from past tasks are no longer accessible, and data from the current task can be seen only once. To address this challenging scenario, we propose a memory-based representation learning technique equipped with our new loss functions. Our approach does not require negative data or knowledge of task boundaries and performs well with smaller batch sizes while being computationally efficient. We demonstrate with extensive experiments that the proposed method outperforms the current state-of-the-art methods on both standard evaluation scenarios and realistic scenarios with blurry task boundaries. For reproducibility, we use the same training pipeline for every compared method and share the code at https://t.ly/SQTj.  ( 3 min )
    Private GANs, Revisited. (arXiv:2302.02936v2 [cs.LG] UPDATED)
    We show that the canonical approach for training differentially private GANs -- updating the discriminator with differentially private stochastic gradient descent (DPSGD) -- can yield significantly improved results after modifications to training. Specifically, we propose that existing instantiations of this approach neglect to consider how adding noise only to discriminator updates inhibits discriminator training, disrupting the balance between the generator and discriminator necessary for successful GAN training. We show that a simple fix -- taking more discriminator steps between generator steps -- restores parity between the generator and discriminator and improves results. Additionally, with the goal of restoring parity, we experiment with other modifications -- namely, large batch sizes and adaptive discriminator update frequency -- to improve discriminator training and see further improvements in generation quality. Our results demonstrate that on standard image synthesis benchmarks, DPSGD outperforms all alternative GAN privatization schemes. Code: https://github.com/alexbie98/dpgan-revisit.  ( 2 min )
    LinGCN: Structural Linearized Graph Convolutional Network for Homomorphically Encrypted Inference. (arXiv:2309.14331v3 [cs.LG] UPDATED)
    The growth of Graph Convolution Network (GCN) model sizes has revolutionized numerous applications, surpassing human performance in areas such as personal healthcare and financial systems. The deployment of GCNs in the cloud raises privacy concerns due to potential adversarial attacks on client data. To address security concerns, Privacy-Preserving Machine Learning (PPML) using Homomorphic Encryption (HE) secures sensitive client data. However, it introduces substantial computational overhead in practical applications. To tackle those challenges, we present LinGCN, a framework designed to reduce multiplication depth and optimize the performance of HE based GCN inference. LinGCN is structured around three key elements: (1) A differentiable structural linearization algorithm, complemented by a parameterized discrete indicator function, co-trained with model weights to meet the optimization goal. This strategy promotes fine-grained node-level non-linear location selection, resulting in a model with minimized multiplication depth. (2) A compact node-wise polynomial replacement policy with a second-order trainable activation function, steered towards superior convergence by a two-level distillation approach from an all-ReLU based teacher model. (3) an enhanced HE solution that enables finer-grained operator fusion for node-wise activation functions, further reducing multiplication level consumption in HE-based inference. Our experiments on the NTU-XVIEW skeleton joint dataset reveal that LinGCN excels in latency, accuracy, and scalability for homomorphically encrypted inference, outperforming solutions such as CryptoGCN. Remarkably, LinGCN achieves a 14.2x latency speedup relative to CryptoGCN, while preserving an inference accuracy of 75% and notably reducing multiplication depth.  ( 3 min )
    Module-wise Training of Neural Networks via the Minimizing Movement Scheme. (arXiv:2309.17357v3 [cs.LG] UPDATED)
    Greedy layer-wise or module-wise training of neural networks is compelling in constrained and on-device settings where memory is limited, as it circumvents a number of problems of end-to-end back-propagation. However, it suffers from a stagnation problem, whereby early layers overfit and deeper layers stop increasing the test accuracy after a certain depth. We propose to solve this issue by introducing a module-wise regularization inspired by the minimizing movement scheme for gradient flows in distribution space. We call the method TRGL for Transport Regularized Greedy Learning and study it theoretically, proving that it leads to greedy modules that are regular and that progressively solve the task. Experimentally, we show improved accuracy of module-wise training of various architectures such as ResNets, Transformers and VGG, when our regularization is added, superior to that of other module-wise training methods and often to end-to-end training, with as much as 60% less memory usage.  ( 2 min )
    Learning Graph Laplacian with MCP. (arXiv:2010.11559v2 [cs.LG] UPDATED)
    We consider the problem of learning a graph under the Laplacian constraint with a non-convex penalty: minimax concave penalty (MCP). For solving the MCP penalized graphical model, we design an inexact proximal difference-of-convex algorithm (DCA) and prove its convergence to critical points. We note that each subproblem of the proximal DCA enjoys the nice property that the objective function in its dual problem is continuously differentiable with a semismooth gradient. Therefore, we apply an efficient semismooth Newton method to subproblems of the proximal DCA. Numerical experiments on various synthetic and real data sets demonstrate the effectiveness of the non-convex penalty MCP in promoting sparsity. Compared with the existing state-of-the-art method, our method is demonstrated to be more efficient and reliable for learning graph Laplacian with MCP.  ( 2 min )
    Latent Diffusion Energy-Based Model for Interpretable Text Modeling. (arXiv:2206.05895v4 [cs.LG] UPDATED)
    Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in generative modeling. Fueled by its flexibility in the formulation and strong modeling power of the latent space, recent works built upon it have made interesting attempts aiming at the interpretability of text modeling. However, latent space EBMs also inherit some flaws from EBMs in data space; the degenerate MCMC sampling quality in practice can lead to poor generation quality and instability in training, especially on data with complex latent structures. Inspired by the recent efforts that leverage diffusion recovery likelihood learning as a cure for the sampling issue, we introduce a novel symbiosis between the diffusion models and latent space EBMs in a variational learning framework, coined as the latent diffusion energy-based model. We develop a geometric clustering-based regularization jointly with the information bottleneck to further improve the quality of the learned latent space. Experiments on several challenging tasks demonstrate the superior performance of our model on interpretable text modeling over strong counterparts.  ( 2 min )
    Hadamard Domain Training with Integers for Class Incremental Quantized Learning. (arXiv:2310.03675v1 [cs.LG])
    Continual learning is a desirable feature in many modern machine learning applications, which allows in-field adaptation and updating, ranging from accommodating distribution shift, to fine-tuning, and to learning new tasks. For applications with privacy and low latency requirements, the compute and memory demands imposed by continual learning can be cost-prohibitive for resource-constraint edge platforms. Reducing computational precision through fully quantized training (FQT) simultaneously reduces memory footprint and increases compute efficiency for both training and inference. However, aggressive quantization especially integer FQT typically degrades model accuracy to unacceptable levels. In this paper, we propose a technique that leverages inexpensive Hadamard transforms to enable low-precision training with only integer matrix multiplications. We further determine which tensors need stochastic rounding and propose tiled matrix multiplication to enable low-bit width accumulators. We demonstrate the effectiveness of our technique on several human activity recognition datasets and CIFAR100 in a class incremental learning setting. We achieve less than 0.5% and 3% accuracy degradation while we quantize all matrix multiplications inputs down to 4-bits with 8-bit accumulators.  ( 2 min )
    Deep Momentum Multi-Marginal Schr\"odinger Bridge. (arXiv:2303.01751v3 [stat.ML] UPDATED)
    It is a crucial challenge to reconstruct population dynamics using unlabeled samples from distributions at coarse time intervals. Recent approaches such as flow-based models or Schr\"odinger Bridge (SB) models have demonstrated appealing performance, yet the inferred sample trajectories either fail to account for the underlying stochasticity or are $\underline{D}$eep $\underline{M}$omentum Multi-Marginal $\underline{S}$chr\"odinger $\underline{B}$ridge(DMSB), a novel computational framework that learns the smooth measure-valued spline for stochastic systems that satisfy position marginal constraints across time. By tailoring the celebrated Bregman Iteration and extending the Iteration Proportional Fitting to phase space, we manage to handle high-dimensional multi-marginal trajectory inference tasks efficiently. Our algorithm outperforms baselines significantly, as evidenced by experiments for synthetic datasets and a real-world single-cell RNA sequence dataset. Additionally, the proposed approach can reasonably reconstruct the evolution of velocity distribution, from position snapshots only, when there is a ground truth velocity that is nevertheless inaccessible.  ( 2 min )
    Spuriosity Rankings: Sorting Data to Measure and Mitigate Biases. (arXiv:2212.02648v2 [cs.CV] UPDATED)
    We present a simple but effective method to measure and mitigate model biases caused by reliance on spurious cues. Instead of requiring costly changes to one's data or model training, our method better utilizes the data one already has by sorting them. Specifically, we rank images within their classes based on spuriosity (the degree to which common spurious cues are present), proxied via deep neural features of an interpretable network. With spuriosity rankings, it is easy to identify minority subpopulations (i.e. low spuriosity images) and assess model bias as the gap in accuracy between high and low spuriosity images. One can even efficiently remove a model's bias at little cost to accuracy by finetuning its classification head on low spuriosity images, resulting in fairer treatment of samples regardless of spuriosity. We demonstrate our method on ImageNet, annotating $5000$ class-feature dependencies ($630$ of which we find to be spurious) and generating a dataset of $325k$ soft segmentations for these features along the way. Having computed spuriosity rankings via the identified spurious neural features, we assess biases for $89$ diverse models and find that class-wise biases are highly correlated across models. Our results suggest that model bias due to spurious feature reliance is influenced far more by what the model is trained on than how it is trained.  ( 3 min )
    Self-Supervised Masked Convolutional Transformer Block for Anomaly Detection. (arXiv:2209.12148v2 [cs.CV] UPDATED)
    Anomaly detection has recently gained increasing attention in the field of computer vision, likely due to its broad set of applications ranging from product fault detection on industrial production lines and impending event detection in video surveillance to finding lesions in medical scans. Regardless of the domain, anomaly detection is typically framed as a one-class classification task, where the learning is conducted on normal examples only. An entire family of successful anomaly detection methods is based on learning to reconstruct masked normal inputs (e.g. patches, future frames, etc.) and exerting the magnitude of the reconstruction error as an indicator for the abnormality level. Unlike other reconstruction-based methods, we present a novel self-supervised masked convolutional transformer block (SSMCTB) that comprises the reconstruction-based functionality at a core architectural level. The proposed self-supervised block is extremely flexible, enabling information masking at any layer of a neural network and being compatible with a wide range of neural architectures. In this work, we extend our previous self-supervised predictive convolutional attentive block (SSPCAB) with a 3D masked convolutional layer, a transformer for channel-wise attention, as well as a novel self-supervised objective based on Huber loss. Furthermore, we show that our block is applicable to a wider variety of tasks, adding anomaly detection in medical images and thermal videos to the previously considered tasks based on RGB images and surveillance videos. We exhibit the generality and flexibility of SSMCTB by integrating it into multiple state-of-the-art neural models for anomaly detection, bringing forth empirical results that confirm considerable performance improvements on five benchmarks. We release our code and data as open source at: https://github.com/ristea/ssmctb.  ( 3 min )
    High-Degrees-of-Freedom Dynamic Neural Fields for Robot Self-Modeling and Motion Planning. (arXiv:2310.03624v1 [cs.CV])
    A robot self-model is a task-agnostic representation of the robot's physical morphology that can be used for motion planning tasks in absence of classical geometric kinematic models. In particular, when the latter are hard to engineer or the robot's kinematics change unexpectedly, human-free self-modeling is a necessary feature of truly autonomous agents. In this work, we leverage neural fields to allow a robot to self-model its kinematics as a neural-implicit query model learned only from 2D images annotated with camera poses and configurations. This enables significantly greater applicability than existing approaches which have been dependent on depth images or geometry knowledge. To this end, alongside a curricular data sampling strategy, we propose a new encoder-based neural density field architecture for dynamic object-centric scenes conditioned on high numbers of degrees of freedom (DOFs). In a 7-DOF robot test setup, the learned self-model achieves a Chamfer-L2 distance of 2% of the robot's workspace dimension. We demonstrate the capabilities of this model on a motion planning task as an exemplary downstream application.  ( 2 min )
    Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods. (arXiv:2310.02671v1 [math.OC] CROSS LISTED)
    Markov Decision Processes (MDPs) are a formal framework for modeling and solving sequential decision-making problems. In finite-time horizons such problems are relevant for instance for optimal stopping or specific supply chain problems, but also in the training of large language models. In contrast to infinite horizon MDPs optimal policies are not stationary, policies must be learned for every single epoch. In practice all parameters are often trained simultaneously, ignoring the inherent structure suggested by dynamic programming. This paper introduces a combination of dynamic programming and policy gradient called dynamic policy gradient, where the parameters are trained backwards in time. For the tabular softmax parametrisation we carry out the convergence analysis for simultaneous and dynamic policy gradient towards global optima, both in the exact and sampled gradient settings without regularisation. It turns out that the use of dynamic policy gradient training much better exploits the structure of finite-time problems which is reflected in improved convergence bounds.  ( 2 min )
    AnglE-optimized Text Embeddings. (arXiv:2309.12871v2 [cs.CL] UPDATED)
    High-quality text embedding is pivotal in improving semantic textual similarity (STS) tasks, which are crucial components in Large Language Model (LLM) applications. However, a common challenge existing text embedding models face is the problem of vanishing gradients, primarily due to their reliance on the cosine function in the optimization objective, which has saturation zones. To address this issue, this paper proposes a novel angle-optimized text embedding model called AnglE. The core idea of AnglE is to introduce angle optimization in a complex space. This novel approach effectively mitigates the adverse effects of the saturation zone in the cosine function, which can impede gradient and hinder optimization processes. To set up a comprehensive STS evaluation, we experimented on existing short-text STS datasets and a newly collected long-text STS dataset from GitHub Issues. Furthermore, we examine domain-specific STS scenarios with limited labeled data and explore how AnglE works with LLM-annotated data. Extensive experiments were conducted on various tasks including short-text STS, long-text STS, and domain-specific STS tasks. The results show that AnglE outperforms the state-of-the-art (SOTA) STS models that ignore the cosine saturation zone. These findings demonstrate the ability of AnglE to generate high-quality text embeddings and the usefulness of angle optimization in STS.  ( 2 min )
    MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning. (arXiv:2310.03731v1 [cs.CL])
    The recently released GPT-4 Code Interpreter has demonstrated remarkable proficiency in solving challenging math problems, primarily attributed to its ability to seamlessly reason with natural language, generate code, execute code, and continue reasoning based on the execution output. In this paper, we present a method to fine-tune open-source language models, enabling them to use code for modeling and deriving math equations and, consequently, enhancing their mathematical reasoning abilities. We propose a method of generating novel and high-quality datasets with math problems and their code-based solutions, referred to as MathCodeInstruct. Each solution interleaves natural language, code, and execution results. We also introduce a customized supervised fine-tuning and inference approach. This approach yields the MathCoder models, a family of models capable of generating code-based solutions for solving challenging math problems. Impressively, the MathCoder models achieve state-of-the-art scores among open-source LLMs on the MATH (45.2%) and GSM8K (83.9%) datasets, substantially outperforming other open-source alternatives. Notably, the MathCoder model not only surpasses ChatGPT-3.5 and PaLM-2 on GSM8K and MATH but also outperforms GPT-4 on the competition-level MATH dataset. The dataset and models will be released at https://github.com/mathllm/MathCoder.  ( 2 min )
    Logic of Differentiable Logics: Towards a Uniform Semantics of DL. (arXiv:2303.10650v4 [cs.LO] UPDATED)
    Differentiable logics (DL) have recently been proposed as a method of training neural networks to satisfy logical specifications. A DL consists of a syntax in which specifications are stated and an interpretation function that translates expressions in the syntax into loss functions. These loss functions can then be used during training with standard gradient descent algorithms. The variety of existing DLs and the differing levels of formality with which they are treated makes a systematic comparative study of their properties and implementations difficult. This paper remedies this problem by suggesting a meta-language for defining DLs that we call the Logic of Differentiable Logics, or LDL. Syntactically, it generalises the syntax of existing DLs to FOL, and for the first time introduces the formalism for reasoning about vectors and learners. Semantically, it introduces a general interpretation function that can be instantiated to define loss functions arising from different existing DLs. We use LDL to establish several theoretical properties of existing DLs, and to conduct their empirical study in neural network verification.  ( 2 min )
    Towards Inferential Reproducibility of Machine Learning Research. (arXiv:2302.04054v6 [cs.LG] UPDATED)
    Reliability of machine learning evaluation -- the consistency of observed evaluation scores across replicated model training runs -- is affected by several sources of nondeterminism which can be regarded as measurement noise. Current tendencies to remove noise in order to enforce reproducibility of research results neglect inherent nondeterminism at the implementation level and disregard crucial interaction effects between algorithmic noise factors and data properties. This limits the scope of conclusions that can be drawn from such experiments. Instead of removing noise, we propose to incorporate several sources of variance, including their interaction with data properties, into an analysis of significance and reliability of machine learning evaluation, with the aim to draw inferences beyond particular instances of trained models. We show how to use linear mixed effects models (LMEMs) to analyze performance evaluation scores, and to conduct statistical inference with a generalized likelihood ratio test (GLRT). This allows us to incorporate arbitrary sources of noise like meta-parameter variations into statistical significance testing, and to assess performance differences conditional on data properties. Furthermore, a variance component analysis (VCA) enables the analysis of the contribution of noise sources to overall variance and the computation of a reliability coefficient by the ratio of substantial to total variance.  ( 3 min )
    MediTab: Scaling Medical Tabular Data Predictors via Data Consolidation, Enrichment, and Refinement. (arXiv:2305.12081v2 [cs.LG] UPDATED)
    Tabular data prediction has been employed in medical applications such as patient health risk prediction. However, existing methods usually revolve around the algorithm design while overlooking the significance of data engineering. Medical tabular datasets frequently exhibit significant heterogeneity across different sources, with limited sample sizes per source. As such, previous predictors are often trained on manually curated small datasets that struggle to generalize across different tabular datasets during inference. This paper proposes to scale medical tabular data predictors (MediTab) to various tabular inputs with varying features. The method uses a data engine that leverages large language models (LLMs) to consolidate tabular samples to overcome the barrier across tables with distinct schema. It also aligns out-domain data with the target task using a "learn, annotate, and refinement" pipeline. The expanded training data then enables the pre-trained MediTab to infer for arbitrary tabular input in the domain without fine-tuning, resulting in significant improvements over supervised baselines: it reaches an average ranking of 1.57 and 1.00 on 7 patient outcome prediction datasets and 3 trial outcome prediction datasets, respectively. In addition, MediTab exhibits impressive zero-shot performances: it outperforms supervised XGBoost models by 8.9% and 17.2% on average in two prediction tasks, respectively. The code is available at https://github.com/RyanWangZf/MediTab.  ( 3 min )
    DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines. (arXiv:2310.03714v1 [cs.CL])
    The ML community is rapidly exploring techniques for prompting language models (LMs) and for stacking them into pipelines that solve complex tasks. Unfortunately, existing LM pipelines are typically implemented using hard-coded "prompt templates", i.e. lengthy strings discovered via trial and error. Toward a more systematic approach for developing and optimizing LM pipelines, we introduce DSPy, a programming model that abstracts LM pipelines as text transformation graphs, i.e. imperative computational graphs where LMs are invoked through declarative modules. DSPy modules are parameterized, meaning they can learn (by creating and collecting demonstrations) how to apply compositions of prompting, finetuning, augmentation, and reasoning techniques. We design a compiler that will optimize any DSPy pipeline to maximize a given metric. We conduct two case studies, showing that succinct DSPy programs can express and optimize sophisticated LM pipelines that reason about math word problems, tackle multi-hop retrieval, answer complex questions, and control agent loops. Within minutes of compiling, a few lines of DSPy allow GPT-3.5 and llama2-13b-chat to self-bootstrap pipelines that outperform standard few-shot prompting (generally by over 25% and 65%, respectively) and pipelines with expert-created demonstrations (by up to 5-46% and 16-40%, respectively). On top of that, DSPy programs compiled to open and relatively small LMs like 770M-parameter T5 and llama2-13b-chat are competitive with approaches that rely on expert-written prompt chains for proprietary GPT-3.5. DSPy is available at https://github.com/stanfordnlp/dspy  ( 3 min )
    GENER: A Parallel Layer Deep Learning Network To Detect Gene-Gene Interactions From Gene Expression Data. (arXiv:2310.03611v1 [cs.LG])
    Detecting and discovering new gene interactions based on known gene expressions and gene interaction data presents a significant challenge. Various statistical and deep learning methods have attempted to tackle this challenge by leveraging the topological structure of gene interactions and gene expression patterns to predict novel gene interactions. In contrast, some approaches have focused exclusively on utilizing gene expression profiles. In this context, we introduce GENER, a parallel-layer deep learning network designed exclusively for the identification of gene-gene relationships using gene expression data. We conducted two training experiments and compared the performance of our network with that of existing statistical and deep learning approaches. Notably, our model achieved an average AUROC score of 0.834 on the combined BioGRID&DREAM5 dataset, outperforming competing methods in predicting gene-gene interactions.  ( 2 min )
    Efficient Graph Field Integrators Meet Point Clouds. (arXiv:2302.00942v6 [cs.LG] UPDATED)
    We present two new classes of algorithms for efficient field integration on graphs encoding point clouds. The first class, SeparatorFactorization(SF), leverages the bounded genus of point cloud mesh graphs, while the second class, RFDiffusion(RFD), uses popular epsilon-nearest-neighbor graph representations for point clouds. Both can be viewed as providing the functionality of Fast Multipole Methods (FMMs), which have had a tremendous impact on efficient integration, but for non-Euclidean spaces. We focus on geometries induced by distributions of walk lengths between points (e.g., shortest-path distance). We provide an extensive theoretical analysis of our algorithms, obtaining new results in structural graph theory as a byproduct. We also perform exhaustive empirical evaluation, including on-surface interpolation for rigid and deformable objects (particularly for mesh-dynamics modeling), Wasserstein distance computations for point clouds, and the Gromov-Wasserstein variant.  ( 2 min )
    One-Versus-Others Attention: Scalable Multimodal Integration. (arXiv:2307.05435v2 [cs.LG] UPDATED)
    Multimodal learning models have become increasingly important as they surpass single-modality approaches on diverse tasks ranging from question-answering to autonomous driving. Despite the importance of multimodal learning, existing efforts focus on NLP applications, where the number of modalities is typically less than four (audio, video, text, images). However, data inputs in other domains, such as the medical field, may include X-rays, PET scans, MRIs, genetic screening, clinical notes, and more, creating a need for both efficient and accurate information fusion. Many state-of-the-art models rely on pairwise cross-modal attention, which does not scale well for applications with more than three modalities. For $n$ modalities, computing attention will result in $n \choose 2$ operations, potentially requiring considerable amounts of computational resources. To address this, we propose a new domain-neutral attention mechanism, One-Versus-Others (OvO) attention, that scales linearly with the number of modalities and requires only $n$ attention operations, thus offering a significant reduction in computational complexity compared to existing cross-modal attention algorithms. Using three diverse real-world datasets as well as an additional simulation experiment, we show that our method improves performance compared to popular fusion techniques while decreasing computation costs.  ( 2 min )
    Handling Data Heterogeneity in Federated Learning via Knowledge Distillation and Fusion. (arXiv:2207.11447v2 [cs.LG] UPDATED)
    Federated learning (FL) supports distributed training of a global machine learning model across multiple devices with the help of a central server. However, data heterogeneity across different devices leads to the client model drift issue and results in model performance degradation and poor model fairness. To address the issue, we design Federated learning with global-local Knowledge Fusion (FedKF) scheme in this paper. The key idea in FedKF is to let the server return the global knowledge to be fused with the local knowledge in each training round so that the local model can be regularized towards the global optima. Therefore, the client model drift issue can be mitigated. In FedKF, we first propose the active-inactive model aggregation technique that supports a precise global knowledge representation. Then, we propose a data-free knowledge distillation (KD) approach to enable each client model to learn the global knowledge (embedded in the global model) while each client model can still learn the local knowledge (embedded in the local dataset) simultaneously, thereby realizing the global-local knowledge fusion process. The theoretical analysis and intensive experiments demonstrate the superiority of FedKF over previous solutions.  ( 2 min )
    Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!. (arXiv:2310.03693v1 [cs.CL])
    Optimizing large language models (LLMs) for downstream use cases often involves the customization of pre-trained LLMs through further fine-tuning. Meta's open release of Llama models and OpenAI's APIs for fine-tuning GPT-3.5 Turbo on custom datasets also encourage this practice. But, what are the safety costs associated with such custom fine-tuning? We note that while existing safety alignment infrastructures can restrict harmful behaviors of LLMs at inference time, they do not cover safety risks when fine-tuning privileges are extended to end-users. Our red teaming studies find that the safety alignment of LLMs can be compromised by fine-tuning with only a few adversarially designed training examples. For instance, we jailbreak GPT-3.5 Turbo's safety guardrails by fine-tuning it on only 10 such examples at a cost of less than $0.20 via OpenAI's APIs, making the model responsive to nearly any harmful instructions. Disconcertingly, our research also reveals that, even without malicious intent, simply fine-tuning with benign and commonly used datasets can also inadvertently degrade the safety alignment of LLMs, though to a lesser extent. These findings suggest that fine-tuning aligned LLMs introduces new safety risks that current safety infrastructures fall short of addressing -- even if a model's initial safety alignment is impeccable, it is not necessarily to be maintained after custom fine-tuning. We outline and critically analyze potential mitigations and advocate for further research efforts toward reinforcing safety protocols for the custom fine-tuning of aligned LLMs.  ( 3 min )
    Demystifying Oversmoothing in Attention-Based Graph Neural Networks. (arXiv:2305.16102v2 [cs.LG] UPDATED)
    Oversmoothing in Graph Neural Networks (GNNs) refers to the phenomenon where increasing network depth leads to homogeneous node representations. While previous work has established that Graph Convolutional Networks (GCNs) exponentially lose expressive power, it remains controversial whether the graph attention mechanism can mitigate oversmoothing. In this work, we provide a definitive answer to this question through a rigorous mathematical analysis, by viewing attention-based GNNs as nonlinear time-varying dynamical systems and incorporating tools and techniques from the theory of products of inhomogeneous matrices and the joint spectral radius. We establish that, contrary to popular belief, the graph attention mechanism cannot prevent oversmoothing and loses expressive power exponentially. The proposed framework extends the existing results on oversmoothing for symmetric GCNs to a significantly broader class of GNN models, including random walk GCNs, Graph Attention Networks (GATs) and (graph) transformers. In particular, our analysis accounts for asymmetric, state-dependent and time-varying aggregation operators and a wide range of common nonlinear activation functions, such as ReLU, LeakyReLU, GELU and SiLU.  ( 2 min )
    Modality Cycles with Masked Conditional Diffusion for Unsupervised Anomaly Segmentation in MRI. (arXiv:2308.16150v2 [eess.IV] UPDATED)
    Unsupervised anomaly segmentation aims to detect patterns that are distinct from any patterns processed during training, commonly called abnormal or out-of-distribution patterns, without providing any associated manual segmentations. Since anomalies during deployment can lead to model failure, detecting the anomaly can enhance the reliability of models, which is valuable in high-risk domains like medical imaging. This paper introduces Masked Modality Cycles with Conditional Diffusion (MMCCD), a method that enables segmentation of anomalies across diverse patterns in multimodal MRI. The method is based on two fundamental ideas. First, we propose the use of cyclic modality translation as a mechanism for enabling abnormality detection. Image-translation models learn tissue-specific modality mappings, which are characteristic of tissue physiology. Thus, these learned mappings fail to translate tissues or image patterns that have never been encountered during training, and the error enables their segmentation. Furthermore, we combine image translation with a masked conditional diffusion model, which attempts to `imagine' what tissue exists under a masked area, further exposing unknown patterns as the generative model fails to recreate them. We evaluate our method on a proxy task by training on healthy-looking slices of BraTS2021 multi-modality MRIs and testing on slices with tumors. We show that our method compares favorably to previous unsupervised approaches based on image reconstruction and denoising with autoencoders and diffusion models.  ( 3 min )
    An Algebraically Converging Stochastic Gradient Descent Algorithm for Global Optimization. (arXiv:2204.05923v3 [math.OC] UPDATED)
    We propose a new gradient descent algorithm with added stochastic terms for finding the global optimizers of nonconvex optimization problems. A key component in the algorithm is the adaptive tuning of the randomness based on the value of the objective function. In the language of simulated annealing, the temperature is state-dependent. With this, we prove the global convergence of the algorithm with an algebraic rate both in probability and in the parameter space. This is a significant improvement over the classical rate from using a more straightforward control of the noise term. The convergence proof is based on the actual discrete setup of the algorithm, not just its continuous limit as often done in the literature. We also present several numerical examples to demonstrate the efficiency and robustness of the algorithm for reasonably complex objective functions.  ( 2 min )
    Stochastic interpolants with data-dependent couplings. (arXiv:2310.03725v1 [cs.LG])
    Generative models inspired by dynamical transport of measure -- such as flows and diffusions -- construct a continuous-time map between two probability densities. Conventionally, one of these is the target density, only accessible through samples, while the other is taken as a simple base density that is data-agnostic. In this work, using the framework of stochastic interpolants, we formalize how to \textit{couple} the base and the target densities. This enables us to incorporate information about class labels or continuous embeddings to construct dynamical transport maps that serve as conditional generative models. We show that these transport maps can be learned by solving a simple square loss regression problem analogous to the standard independent setting. We demonstrate the usefulness of constructing dependent couplings in practice through experiments in super-resolution and in-painting.  ( 2 min )
    Learning Hierarchical Interactive Multi-Object Search for Mobile Manipulation. (arXiv:2307.06125v2 [cs.RO] UPDATED)
    Existing object-search approaches enable robots to search through free pathways, however, robots operating in unstructured human-centered environments frequently also have to manipulate the environment to their needs. In this work, we introduce a novel interactive multi-object search task in which a robot has to open doors to navigate rooms and search inside cabinets and drawers to find target objects. These new challenges require combining manipulation and navigation skills in unexplored environments. We present HIMOS, a hierarchical reinforcement learning approach that learns to compose exploration, navigation, and manipulation skills. To achieve this, we design an abstract high-level action space around a semantic map memory and leverage the explored environment as instance navigation points. We perform extensive experiments in simulation and the real world that demonstrate that, with accurate perception, the decision making of HIMOS effectively transfers to new environments in a zero-shot manner. It shows robustness to unseen subpolicies, failures in their execution, and different robot kinematics. These capabilities open the door to a wide range of downstream tasks across embodied AI and real-world use cases.  ( 2 min )
    Smoothing Methods for Automatic Differentiation Across Conditional Branches. (arXiv:2310.03585v1 [cs.LG])
    Programs involving discontinuities introduced by control flow constructs such as conditional branches pose challenges to mathematical optimization methods that assume a degree of smoothness in the objective function's response surface. Smooth interpretation (SI) is a form of abstract interpretation that approximates the convolution of a program's output with a Gaussian kernel, thus smoothing its output in a principled manner. Here, we combine SI with automatic differentiation (AD) to efficiently compute gradients of smoothed programs. In contrast to AD across a regular program execution, these gradients also capture the effects of alternative control flow paths. The combination of SI with AD enables the direct gradient-based parameter synthesis for branching programs, allowing for instance the calibration of simulation models or their combination with neural network models in machine learning pipelines. We detail the effects of the approximations made for tractability in SI and propose a novel Monte Carlo estimator that avoids the underlying assumptions by estimating the smoothed programs' gradients through a combination of AD and sampling. Using DiscoGrad, our tool for automatically translating simple C++ programs to a smooth differentiable form, we perform an extensive evaluation. We compare the combination of SI with AD and our Monte Carlo estimator to existing gradient-free and stochastic methods on four non-trivial and originally discontinuous problems ranging from classical simulation-based optimization to neural network-driven control. While the optimization progress with the SI-based estimator depends on the complexity of the programs' control flow, our Monte Carlo estimator is competitive in all problems, exhibiting the fastest convergence by a substantial margin in our highest-dimensional problem.  ( 3 min )
    Forecasting Tropical Cyclones with Cascaded Diffusion Models. (arXiv:2310.01690v2 [physics.ao-ph] UPDATED)
    As cyclones become more intense due to climate change, the rise of AI-based modelling provides a more affordable and accessible approach compared to traditional methods based on mathematical models. This work leverages diffusion models to forecast cyclone trajectories and precipitation patterns by integrating satellite imaging, remote sensing, and atmospheric data, employing a cascaded approach that incorporates forecasting, super-resolution, and precipitation modelling, with training on a dataset of 51 cyclones from six major basins. Experiments demonstrate that the final forecasts from the cascaded models show accurate predictions up to a 36-hour rollout, with SSIM and PSNR values exceeding 0.5 and 20 dB, respectively, for all three tasks. This work also highlights the promising efficiency of AI methods such as diffusion models for high-performance needs, such as cyclone forecasting, while remaining computationally affordable, making them ideal for highly vulnerable regions with critical forecasting needs and financial limitations. Code accessible at \url{https://github.com/nathzi1505/forecast-diffmodels}.  ( 2 min )
    Abstractors and relational cross-attention: An inductive bias for explicit relational reasoning in Transformers. (arXiv:2304.00195v3 [stat.ML] UPDATED)
    An extension of Transformers is proposed that enables explicit relational reasoning through a novel module called the Abstractor. At the core of the Abstractor is a variant of attention called relational cross-attention. The approach is motivated by an architectural inductive bias for relational learning that disentangles relational information from extraneous features about individual objects. This enables explicit relational reasoning, supporting abstraction and generalization from limited data. The Abstractor is first evaluated on simple discriminative relational tasks and compared to existing relational architectures. Next, the Abstractor is evaluated on purely relational sequence-to-sequence tasks, where dramatic improvements are seen in sample efficiency compared to standard Transformers. Finally, Abstractors are evaluated on a collection of tasks based on mathematical problem solving, where modest but consistent improvements in performance and sample efficiency are observed.  ( 2 min )
    Residual Multi-Fidelity Neural Network Computing. (arXiv:2310.03572v1 [cs.LG])
    In this work, we consider the general problem of constructing a neural network surrogate model using multi-fidelity information. Given an inexpensive low-fidelity and an expensive high-fidelity computational model, we present a residual multi-fidelity computational framework that formulates the correlation between models as a residual function, a possibly non-linear mapping between 1) the shared input space of the models together with the low-fidelity model output and 2) the discrepancy between the two model outputs. To accomplish this, we train two neural networks to work in concert. The first network learns the residual function on a small set of high-fidelity and low-fidelity data. Once trained, this network is used to generate additional synthetic high-fidelity data, which is used in the training of a second network. This second network, once trained, acts as our surrogate for the high-fidelity quantity of interest. We present three numerical examples to demonstrate the power of the proposed framework. In particular, we show that dramatic savings in computational cost may be achieved when the output predictions are desired to be accurate within small tolerances.  ( 2 min )
    RUSOpt: Robotic UltraSound Probe Normalization with Bayesian Optimization for In-plane and Out-plane Scanning. (arXiv:2310.03406v1 [cs.RO])
    The one of the significant challenges faced by autonomous robotic ultrasound systems is acquiring high-quality images across different patients. The proper orientation of the robotized probe plays a crucial role in governing the quality of ultrasound images. To address this challenge, we propose a sample-efficient method to automatically adjust the orientation of the ultrasound probe normal to the point of contact on the scanning surface, thereby improving the acoustic coupling of the probe and resulting image quality. Our method utilizes Bayesian Optimization (BO) based search on the scanning surface to efficiently search for the normalized probe orientation. We formulate a novel objective function for BO that leverages the contact force measurements and underlying mechanics to identify the normal. We further incorporate a regularization scheme in BO to handle the noisy objective function. The performance of the proposed strategy has been assessed through experiments on urinary bladder phantoms. These phantoms included planar, tilted, and rough surfaces, and were examined using both linear and convex probes with varying search space limits. Further, simulation-based studies have been carried out using 3D human mesh models. The results demonstrate that the mean ($\pm$SD) absolute angular error averaged over all phantoms and 3D models is $\boldsymbol{2.4\pm0.7^\circ}$ and $\boldsymbol{2.1\pm1.3^\circ}$, respectively.  ( 2 min )
    Deep Generative Models of Music Expectation. (arXiv:2310.03500v1 [cs.SD])
    A prominent theory of affective response to music revolves around the concepts of surprisal and expectation. In prior work, this idea has been operationalized in the form of probabilistic models of music which allow for precise computation of song (or note-by-note) probabilities, conditioned on a 'training set' of prior musical or cultural experiences. To date, however, these models have been limited to compute exact probabilities through hand-crafted features or restricted to linear models which are likely not sufficient to represent the complex conditional distributions present in music. In this work, we propose to use modern deep probabilistic generative models in the form of a Diffusion Model to compute an approximate likelihood of a musical input sequence. Unlike prior work, such a generative model parameterized by deep neural networks is able to learn complex non-linear features directly from a training set itself. In doing so, we expect to find that such models are able to more accurately represent the 'surprisal' of music for human listeners. From the literature, it is known that there is an inverted U-shaped relationship between surprisal and the amount human subjects 'like' a given song. In this work we show that pre-trained diffusion models indeed yield musical surprisal values which exhibit a negative quadratic relationship with measured subject 'liking' ratings, and that the quality of this relationship is competitive with state of the art methods such as IDyOM. We therefore present this model a preliminary step in developing modern deep generative models of music expectation and subjective likability.  ( 2 min )
    Constraint-Conditioned Policy Optimization for Versatile Safe Reinforcement Learning. (arXiv:2310.03718v1 [cs.LG])
    Safe reinforcement learning (RL) focuses on training reward-maximizing agents subject to pre-defined safety constraints. Yet, learning versatile safe policies that can adapt to varying safety constraint requirements during deployment without retraining remains a largely unexplored and challenging area. In this work, we formulate the versatile safe RL problem and consider two primary requirements: training efficiency and zero-shot adaptation capability. To address them, we introduce the Conditioned Constrained Policy Optimization (CCPO) framework, consisting of two key modules: (1) Versatile Value Estimation (VVE) for approximating value functions under unseen threshold conditions, and (2) Conditioned Variational Inference (CVI) for encoding arbitrary constraint thresholds during policy optimization. Our extensive experiments demonstrate that CCPO outperforms the baselines in terms of safety and task performance while preserving zero-shot adaptation capabilities to different constraint thresholds data-efficiently. This makes our approach suitable for real-world dynamic applications.  ( 2 min )
    Optimal 1-Wasserstein Distance for WGANs. (arXiv:2201.02824v2 [stat.ML] UPDATED)
    The mathematical forces at work behind Generative Adversarial Networks raise challenging theoretical issues. Motivated by the important question of characterizing the geometrical properties of the generated distributions, we provide a thorough analysis of Wasserstein GANs (WGANs) in both the finite sample and asymptotic regimes. We study the specific case where the latent space is univariate and derive results valid regardless of the dimension of the output space. We show in particular that for a fixed sample size, the optimal WGANs are closely linked with connected paths minimizing the sum of the squared Euclidean distances between the sample points. We also highlight the fact that WGANs are able to approach (for the 1-Wasserstein distance) the target distribution as the sample size tends to infinity, at a given convergence rate and provided the family of generative Lipschitz functions grows appropriately. We derive in passing new results on optimal transport theory in the semi-discrete setting.  ( 2 min )
    How the level sampling process impacts zero-shot generalisation in deep reinforcement learning. (arXiv:2310.03494v1 [cs.LG])
    A key limitation preventing the wider adoption of autonomous agents trained via deep reinforcement learning (RL) is their limited ability to generalise to new environments, even when these share similar characteristics with environments encountered during training. In this work, we investigate how a non-uniform sampling strategy of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents, considering two failure modes: overfitting and over-generalisation. As a first step, we measure the mutual information (MI) between the agent's internal representation and the set of training levels, which we find to be well-correlated to instance overfitting. In contrast to uniform sampling, adaptive sampling strategies prioritising levels based on their value loss are more effective at maintaining lower MI, which provides a novel theoretical justification for this class of techniques. We then turn our attention to unsupervised environment design (UED) methods, which adaptively generate new training levels and minimise MI more effectively than methods sampling from a fixed set. However, we find UED methods significantly shift the training distribution, resulting in over-generalisation and worse ZSG performance over the distribution of interest. To prevent both instance overfitting and over-generalisation, we introduce self-supervised environment design (SSED). SSED generates levels using a variational autoencoder, effectively reducing MI while minimising the shift with the distribution of interest, and leads to statistically significant improvements in ZSG over fixed-set level sampling strategies and UED methods.  ( 3 min )
    FLAIM: AIM-based Synthetic Data Generation in the Federated Setting. (arXiv:2310.03447v1 [cs.CR])
    Preserving individual privacy while enabling collaborative data sharing is crucial for organizations. Synthetic data generation is one solution, producing artificial data that mirrors the statistical properties of private data. While numerous techniques have been devised under differential privacy, they predominantly assume data is centralized. However, data is often distributed across multiple clients in a federated manner. In this work, we initiate the study of federated synthetic tabular data generation. Building upon a SOTA central method known as AIM, we present DistAIM and FLAIM. We show it is straightforward to distribute AIM, extending a recent approach based on secure multi-party computation which necessitates additional overhead, making it less suited to federated scenarios. We then demonstrate that naively federating AIM can lead to substantial degradation in utility under the presence of heterogeneity. To mitigate both issues, we propose an augmented FLAIM approach that maintains a private proxy of heterogeneity. We simulate our methods across a range of benchmark datasets under different degrees of heterogeneity and show this can improve utility while reducing overhead.  ( 2 min )
    Deep Learning for Genomics: A Concise Overview. (arXiv:1802.00810v4 [q-bio.GN] UPDATED)
    Advancements in genomic research such as high-throughput sequencing techniques have driven modern genomic studies into "big data" disciplines. This data explosion is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in a variety of fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning since we are expecting from deep learning a superhuman intelligence that explores beyond our knowledge to interpret the genome. A powerful deep learning model should rely on insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with a proper deep architecture, and remark on practical considerations of developing modern deep learning architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research, as well as pointing out potential opportunities and obstacles for future genomics applications.  ( 2 min )
    Physics of Language Models: Part 1, Context-Free Grammar. (arXiv:2305.13673v2 [cs.CL] UPDATED)
    We design controlled experiments to study HOW generative language models, like GPT, learn context-free grammars (CFGs) -- diverse language systems with a tree-like structure capturing many aspects of natural languages, programs, and logics. CFGs are as hard as pushdown automata, and can be ambiguous so that verifying if a string satisfies the rules requires dynamic programming. We construct synthetic data and demonstrate that even for difficult (long and ambiguous) CFGs, pre-trained transformers can learn to generate sentences with near-perfect accuracy and impressive diversity. More importantly, we delve into the physical principles behind how transformers learns CFGs. We discover that the hidden states within the transformer implicitly and precisely encode the CFG structure (such as putting tree node information exactly on the subtree boundary), and learn to form "boundary to boundary" attentions resembling dynamic programming. We also cover some extension of CFGs as well as the robustness aspect of transformers against grammar mistakes. Overall, our research provides a comprehensive and empirical understanding of how transformers learn CFGs, and reveals the physical mechanisms utilized by transformers to capture the structure and rules of languages.  ( 2 min )
    PlaceNav: Topological Navigation through Place Recognition. (arXiv:2309.17260v3 [cs.RO] UPDATED)
    Recent results suggest that splitting topological navigation into robot-independent and robot-specific components improves navigation performance by enabling the robot-independent part to be trained with data collected by different robot types. However, the navigation methods are still limited by the scarcity of suitable training data and suffer from poor computational scaling. In this work, we present PlaceNav, subdividing the robot-independent part into navigation-specific and generic computer vision components. We utilize visual place recognition for the subgoal selection of the topological navigation pipeline. This makes subgoal selection more efficient and enables leveraging large-scale datasets from non-robotics sources, increasing training data availability. Bayesian filtering, enabled by place recognition, further improves navigation performance by increasing the temporal consistency of subgoals. Our experimental results verify the design and the new model obtains a 76% higher success rate in indoor and 23% higher in outdoor navigation tasks with higher computational efficiency.  ( 2 min )
    Anytime-valid t-tests and confidence sequences for Gaussian means with unknown variance. (arXiv:2310.03722v1 [math.ST])
    In 1976, Lai constructed a nontrivial confidence sequence for the mean $\mu$ of a Gaussian distribution with unknown variance $\sigma$. Curiously, he employed both an improper (right Haar) mixture over $\sigma$ and an improper (flat) mixture over $\mu$. Here, we elaborate carefully on the details of his construction, which use generalized nonintegrable martingales and an extended Ville's inequality. While this does yield a sequential t-test, it does not yield an ``e-process'' (due to the nonintegrability of his martingale). In this paper, we develop two new e-processes and confidence sequences for the same setting: one is a test martingale in a reduced filtration, while the other is an e-process in the canonical data filtration. These are respectively obtained by swapping Lai's flat mixture for a Gaussian mixture, and swapping the right Haar mixture over $\sigma$ with the maximum likelihood estimate under the null, as done in universal inference. We also analyze the width of resulting confidence sequences, which have a curious dependence on the error probability $\alpha$. Numerical experiments are provided along the way to compare and contrast the various approaches.  ( 2 min )
    Network Cascade Vulnerability using Constrained Bayesian Optimization. (arXiv:2304.14420v2 [cs.SI] UPDATED)
    Measures of power grid vulnerability are often assessed by the amount of damage an adversary can exact on the network. However, the cascading impact of such attacks is often overlooked, even though cascades are one of the primary causes of large-scale blackouts. This paper explores modifications of transmission line protection settings as candidates for adversarial attacks, which can remain undetectable as long as the network equilibrium state remains unaltered. This forms the basis of a black-box function in a Bayesian optimization procedure, where the objective is to find protection settings that maximize network degradation due to cascading. Notably, our proposed method is agnostic to the choice of the cascade simulator and its underlying assumptions. Numerical experiments reveal that, against conventional wisdom, maximally misconfiguring the protection settings of all network lines does not cause the most cascading. More surprisingly, even when the degree of misconfiguration is limited due to resource constraints, it is still possible to find settings that produce cascades comparable in severity to instances where there are no resource constraints.  ( 2 min )
    HeaP: Hierarchical Policies for Web Actions using LLMs. (arXiv:2310.03720v1 [cs.LG])
    Large language models (LLMs) have demonstrated remarkable capabilities in performing a range of instruction following tasks in few and zero-shot settings. However, teaching LLMs to perform tasks on the web presents fundamental challenges -- combinatorially large open-world tasks and variations across web interfaces. We tackle these challenges by leveraging LLMs to decompose web tasks into a collection of sub-tasks, each of which can be solved by a low-level, closed-loop policy. These policies constitute a shared grammar across tasks, i.e., new web tasks can be expressed as a composition of these policies. We propose a novel framework, Hierarchical Policies for Web Actions using LLMs (HeaP), that learns a set of hierarchical LLM prompts from demonstrations for planning high-level tasks and executing them via a sequence of low-level policies. We evaluate HeaP against a range of baselines on a suite of web tasks, including MiniWoB++, WebArena, a mock airline CRM, as well as live website interactions, and show that it is able to outperform prior works using orders of magnitude less data.  ( 2 min )
    Characterization of causal ancestral graphs for time series with latent confounders. (arXiv:2112.08417v2 [stat.ME] UPDATED)
    In this paper, we introduce a novel class of graphical models for representing time lag specific causal relationships and independencies of multivariate time series with unobserved confounders. We completely characterize these graphs and show that they constitute proper subsets of the currently employed model classes. As we show, from the novel graphs one can thus draw stronger causal inferences -- without additional assumptions. We further introduce a graphical representation of Markov equivalence classes of the novel graphs. This graphical representation contains more causal knowledge than what current state-of-the-art causal discovery algorithms learn.  ( 2 min )
    CLASSify: A Web-Based Tool for Machine Learning. (arXiv:2310.03618v1 [cs.LG])
    Machine learning classification problems are widespread in bioinformatics, but the technical knowledge required to perform model training, optimization, and inference can prevent researchers from utilizing this technology. This article presents an automated tool for machine learning classification problems to simplify the process of training models and producing results while providing informative visualizations and insights into the data. This tool supports both binary and multiclass classification problems, and it provides access to a variety of models and methods. Synthetic data can be generated within the interface to fill missing values, balance class labels, or generate entirely new datasets. It also provides support for feature evaluation and generates explainability scores to indicate which features influence the output the most. We present CLASSify, an open-source tool for simplifying the user experience of solving classification problems without the need for knowledge of machine learning.  ( 2 min )
    ECG-SL: Electrocardiogram(ECG) Segment Learning, a deep learning method for ECG signal. (arXiv:2310.00818v2 [cs.LG] UPDATED)
    Electrocardiogram (ECG) is an essential signal in monitoring human heart activities. Researchers have achieved promising results in leveraging ECGs in clinical applications with deep learning models. However, the mainstream deep learning approaches usually neglect the periodic and formative attribute of the ECG heartbeat waveform. In this work, we propose a novel ECG-Segment based Learning (ECG-SL) framework to explicitly model the periodic nature of ECG signals. More specifically, ECG signals are first split into heartbeat segments, and then structural features are extracted from each of the segments. Based on the structural features, a temporal model is designed to learn the temporal information for various clinical tasks. Further, due to the fact that massive ECG signals are available but the labeled data are very limited, we also explore self-supervised learning strategy to pre-train the models, resulting significant improvement for downstream tasks. The proposed method outperforms the baseline model and shows competitive performances compared with task-specific methods in three clinical applications: cardiac condition diagnosis, sleep apnea detection, and arrhythmia classification. Further, we find that the ECG-SL tends to focus more on each heartbeat's peak and ST range than ResNet by visualizing the saliency maps.  ( 2 min )
    Co-modeling the Sequential and Graphical Routes for Peptide Representation Learning. (arXiv:2310.02964v2 [cs.LG] UPDATED)
    Peptides are formed by the dehydration condensation of multiple amino acids. The primary structure of a peptide can be represented either as an amino acid sequence or as a molecular graph consisting of atoms and chemical bonds. Previous studies have indicated that deep learning routes specific to sequential and graphical peptide forms exhibit comparable performance on downstream tasks. Despite the fact that these models learn representations of the same modality of peptides, we find that they explain their predictions differently. Considering sequential and graphical models as two experts making inferences from different perspectives, we work on fusing expert knowledge to enrich the learned representations for improving the discriminative performance. To achieve this, we propose a peptide co-modeling method, RepCon, which employs a contrastive learning-based framework to enhance the mutual information of representations from decoupled sequential and graphical end-to-end models. It considers representations from the sequential encoder and the graphical encoder for the same peptide sample as a positive pair and learns to enhance the consistency of representations between positive sample pairs and to repel representations between negative pairs. Empirical studies of RepCon and other co-modeling methods are conducted on open-source discriminative datasets, including aggregation propensity, retention time, antimicrobial peptide prediction, and family classification from Peptide Database. Our results demonstrate the superiority of the co-modeling approach over independent modeling, as well as the superiority of RepCon over other methods under the co-modeling framework. In addition, the attribution on RepCon further corroborates the validity of the approach at the level of model explanation.  ( 3 min )
    Towards Robust 3D Object Detection In Rainy Conditions. (arXiv:2310.00944v2 [cs.CV] UPDATED)
    LiDAR sensors are used in autonomous driving applications to accurately perceive the environment. However, they are affected by adverse weather conditions such as snow, fog, and rain. These everyday phenomena introduce unwanted noise into the measurements, severely degrading the performance of LiDAR-based perception systems. In this work, we propose a framework for improving the robustness of LiDAR-based 3D object detectors against road spray. Our approach uses a state-of-the-art adverse weather detection network to filter out spray from the LiDAR point cloud, which is then used as input for the object detector. In this way, the detected objects are less affected by the adverse weather in the scene, resulting in a more accurate perception of the environment. In addition to adverse weather filtering, we explore the use of radar targets to further filter false positive detections. Tests on real-world data show that our approach improves the robustness to road spray of several popular 3D object detectors.  ( 2 min )
    Losses over Labels: Weakly Supervised Learning via Direct Loss Construction. (arXiv:2212.06921v2 [cs.LG] UPDATED)
    Owing to the prohibitive costs of generating large amounts of labeled data, programmatic weak supervision is a growing paradigm within machine learning. In this setting, users design heuristics that provide noisy labels for subsets of the data. These weak labels are combined (typically via a graphical model) to form pseudolabels, which are then used to train a downstream model. In this work, we question a foundational premise of the typical weakly supervised learning pipeline: given that the heuristic provides all ``label" information, why do we need to generate pseudolabels at all? Instead, we propose to directly transform the heuristics themselves into corresponding loss functions that penalize differences between our model and the heuristic. By constructing losses directly from the heuristics, we can incorporate more information than is used in the standard weakly supervised pipeline, such as how the heuristics make their decisions, which explicitly informs feature selection during training. We call our method Losses over Labels (LoL) as it creates losses directly from heuristics without going through the intermediate step of a label. We show that LoL improves upon existing weak supervision methods on several benchmark text and image classification tasks and further demonstrate that incorporating gradient information leads to better performance on almost every task.  ( 2 min )
    Beyond One-Preference-for-All: Multi-Objective Direct Preference Optimization. (arXiv:2310.03708v1 [cs.LG])
    Language models (LMs), despite aligning well with an average labeler through reinforcement learning from human feedback (RLHF), may not universally suit diverse human preferences. Recent approaches therefore opt for customization by collecting multi-dimensional feedback and creating distinct rewards for each dimension (e.g., helpfulness, harmlessness, honesty). LMs can then be tailored to different preferences using multi-objective RL (MORL) with different reward weightings. Yet, RL fine-tuning is unstable and resource-heavy, especially for MORLHF with diverse and usually conflicting objectives. In this paper, we present Multi-Objective Direct Preference Optimization (MODPO), an RL-free algorithm that extends Direct Preference Optimization (DPO) for multiple alignment objectives. Essentially, MODPO trains different LMs to represent different collective reward models that combine all objectives with specific weightings. With a simple cross-entropy loss, the LMs optimized against the MODPO objective are analytically the exact solutions of the original MORLHF objective. Empirical results in safety alignment and long-form question answering confirm that MODPO matches or outperforms existing methods, efficiently producing a Pareto-optimal set of LMs that cater to diverse preferences with 3 times less computational resources compared with MORLHF.  ( 2 min )
    A Long Way to Go: Investigating Length Correlations in RLHF. (arXiv:2310.03716v1 [cs.CL])
    Great successes have been reported using Reinforcement Learning from Human Feedback (RLHF) to align large language models. Open-source preference datasets and reward models have enabled wider experimentation beyond generic chat settings, particularly to make systems more "helpful" for tasks like web question answering, summarization, and multi-turn dialogue. When optimizing for helpfulness, RLHF has been consistently observed to drive models to produce longer outputs. This paper demonstrates that optimizing for response length is a significant factor behind RLHF's reported improvements in these settings. First, we study the relationship between reward and length for reward models trained on three open-source preference datasets for helpfulness. Here, length correlates strongly with reward, and improvements in reward score are driven in large part by shifting the distribution over output lengths. We then explore interventions during both RL and reward model learning to see if we can achieve the same downstream improvements as RLHF without increasing length. While our interventions mitigate length increases, they aren't uniformly effective across settings. Furthermore, we find that even running RLHF with a reward based solely on length can reproduce most of the downstream improvements over the initial policy model, showing that reward models in these settings have a long way to go.  ( 2 min )
    Which mode is better for federated learning? Centralized or Decentralized. (arXiv:2310.03461v1 [cs.LG])
    Both centralized and decentralized approaches have shown excellent performance and great application value in federated learning (FL). However, current studies do not provide sufficient evidence to show which one performs better. Although from the optimization perspective, decentralized methods can approach the comparable convergence of centralized methods with less communication, its test performance has always been inefficient in empirical studies. To comprehensively explore their behaviors in FL, we study their excess risks, including the joint analysis of both optimization and generalization. We prove that on smooth non-convex objectives, 1) centralized FL (CFL) always generalizes better than decentralized FL (DFL); 2) from perspectives of the excess risk and test error in CFL, adopting partial participation is superior to full participation; and, 3) there is a necessary requirement for the topology in DFL to avoid performance collapse as the training scale increases. Based on some simple hardware metrics, we could evaluate which framework is better in practice. Extensive experiments are conducted on common setups in FL to validate that our theoretical analysis is contextually valid in practical scenarios.  ( 2 min )
    In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT. (arXiv:2304.08979v2 [cs.CR] UPDATED)
    The way users acquire information is undergoing a paradigm shift with the advent of ChatGPT. Unlike conventional search engines, ChatGPT retrieves knowledge from the model itself and generates answers for users. ChatGPT's impressive question-answering (QA) capability has attracted more than 100 million users within a short period of time but has also raised concerns regarding its reliability. In this paper, we perform the first large-scale measurement of ChatGPT's reliability in the generic QA scenario with a carefully curated set of 5,695 questions across ten datasets and eight domains. We find that ChatGPT's reliability varies across different domains, especially underperforming in law and science questions. We also demonstrate that system roles, originally designed by OpenAI to allow users to steer ChatGPT's behavior, can impact ChatGPT's reliability in an imperceptible way. We further show that ChatGPT is vulnerable to adversarial examples, and even a single character change can negatively affect its reliability in certain cases. We believe that our study provides valuable insights into ChatGPT's reliability and underscores the need for strengthening the reliability and security of large language models (LLMs).  ( 2 min )
    Strategic Evaluation: Subjects, Evaluators, and Society. (arXiv:2310.03655v1 [cs.CY])
    A broad current application of algorithms is in formal and quantitative measures of murky concepts -- like merit -- to make decisions. When people strategically respond to these sorts of evaluations in order to gain favorable decision outcomes, their behavior can be subjected to moral judgments. They may be described as 'gaming the system' or 'cheating,' or (in other cases) investing 'honest effort' or 'improving.' Machine learning literature on strategic behavior has tried to describe these dynamics by emphasizing the efforts expended by decision subjects hoping to obtain a more favorable assessment -- some works offer ways to preempt or prevent such manipulations, some differentiate 'gaming' from 'improvement' behavior, while others aim to measure the effort burden or disparate effects of classification systems. We begin from a different starting point: that the design of an evaluation itself can be understood as furthering goals held by the evaluator which may be misaligned with broader societal goals. To develop the idea that evaluation represents a strategic interaction in which both the evaluator and the subject of their evaluation are operating out of self-interest, we put forward a model that represents the process of evaluation using three interacting agents: a decision subject, an evaluator, and society, representing a bundle of values and oversight mechanisms. We highlight our model's applicability to a number of social systems where one or two players strategically undermine the others' interests to advance their own. Treating evaluators as themselves strategic allows us to re-cast the scrutiny directed at decision subjects, towards the incentives that underpin institutional designs of evaluations. The moral standing of strategic behaviors often depend on the moral standing of the evaluations and incentives that provoke such behaviors.
    Extreme sparsification of physics-augmented neural networks for interpretable model discovery in mechanics. (arXiv:2310.03652v1 [cs.CE])
    Data-driven constitutive modeling with neural networks has received increased interest in recent years due to its ability to easily incorporate physical and mechanistic constraints and to overcome the challenging and time-consuming task of formulating phenomenological constitutive laws that can accurately capture the observed material response. However, even though neural network-based constitutive laws have been shown to generalize proficiently, the generated representations are not easily interpretable due to their high number of trainable parameters. Sparse regression approaches exist that allow to obtaining interpretable expressions, but the user is tasked with creating a library of model forms which by construction limits their expressiveness to the functional forms provided in the libraries. In this work, we propose to train regularized physics-augmented neural network-based constitutive models utilizing a smoothed version of $L^{0}$-regularization. This aims to maintain the trustworthiness inherited by the physical constraints, but also enables interpretability which has not been possible thus far on any type of machine learning-based constitutive model where model forms were not assumed a-priory but were actually discovered. During the training process, the network simultaneously fits the training data and penalizes the number of active parameters, while also ensuring constitutive constraints such as thermodynamic consistency. We show that the method can reliably obtain interpretable and trustworthy constitutive models for compressible and incompressible hyperelasticity, yield functions, and hardening models for elastoplasticity, for synthetic and experimental data.
    Network Alignment with Transferable Graph Autoencoders. (arXiv:2310.03272v1 [cs.LG])
    Network alignment is the task of establishing one-to-one correspondences between the nodes of different graphs and finds a plethora of applications in high-impact domains. However, this task is known to be NP-hard in its general form, and existing algorithms do not scale up as the size of the graphs increases. To tackle both challenges we propose a novel generalized graph autoencoder architecture, designed to extract powerful and robust node embeddings, that are tailored to the alignment task. We prove that the generated embeddings are associated with the eigenvalues and eigenvectors of the graphs and can achieve more accurate alignment compared to classical spectral methods. Our proposed framework also leverages transfer learning and data augmentation to achieve efficient network alignment at a very large scale without retraining. Extensive experiments on both network and sub-network alignment with real-world graphs provide corroborating evidence supporting the effectiveness and scalability of the proposed approach.
    Solving Diffusion ODEs with Optimal Boundary Conditions for Better Image Super-Resolution. (arXiv:2305.15357v3 [eess.IV] UPDATED)
    Diffusion models, as a kind of powerful generative model, have given impressive results on image super-resolution (SR) tasks. However, due to the randomness introduced in the reverse process of diffusion models, the performances of diffusion-based SR models are fluctuating at every time of sampling, especially for samplers with few resampled steps. This inherent randomness of diffusion models results in ineffectiveness and instability, making it challenging for users to guarantee the quality of SR results. However, our work takes this randomness as an opportunity: fully analyzing and leveraging it leads to the construction of an effective plug-and-play sampling method that owns the potential to benefit a series of diffusion-based SR methods. More in detail, we propose to steadily sample high-quality SR images from pre-trained diffusion-based SR models by solving diffusion ordinary differential equations (diffusion ODEs) with optimal boundary conditions (BCs) and analyze the characteristics between the choices of BCs and their corresponding SR results. Our analysis shows the route to obtain an approximately optimal BC via an efficient exploration in the whole space. The quality of SR results sampled by the proposed method with fewer steps outperforms the quality of results sampled by current methods with randomness from the same pre-trained diffusion-based SR model, which means that our sampling method "boosts" current diffusion-based SR models without any additional training.
    Banach Space Optimality of Neural Architectures With Multivariate Nonlinearities. (arXiv:2310.03696v1 [stat.ML])
    We investigate the variational optimality (specifically, the Banach space optimality) of a large class of neural architectures with multivariate nonlinearities/activation functions. To that end, we construct a new family of Banach spaces defined via a regularization operator and the $k$-plane transform. We prove a representer theorem that states that the solution sets to learning problems posed over these Banach spaces are completely characterized by neural architectures with multivariate nonlinearities. These optimal architectures have skip connections and are tightly connected to orthogonal weight normalization and multi-index models, both of which have received considerable interest in the neural network community. Our framework is compatible with a number of classical nonlinearities including the rectified linear unit (ReLU) activation function, the norm activation function, and the radial basis functions found in the theory of thin-plate/polyharmonic splines. We also show that the underlying spaces are special instances of reproducing kernel Banach spaces and variation spaces. Our results shed light on the regularity of functions learned by neural networks trained on data, particularly with multivariate nonlinearities, and provide new theoretical motivation for several architectural choices found in practice.
    Practical Homomorphic Aggregation for Byzantine ML. (arXiv:2309.05395v3 [cs.LG] UPDATED)
    Due to the large-scale availability of data, machine learning (ML) algorithms are being deployed in distributed topologies, where different nodes collaborate to train ML models over their individual data by exchanging model-related information (e.g., gradients) with a central server. However, distributed learning schemes are notably vulnerable to two threats. First, Byzantine nodes can single-handedly corrupt the learning by sending incorrect information to the server, e.g., erroneous gradients. The standard approach to mitigate such behavior is to use a non-linear robust aggregation method at the server. Second, the server can violate the privacy of the nodes. Recent attacks have shown that exchanging (unencrypted) gradients enables a curious server to recover the totality of the nodes' data. The use of homomorphic encryption (HE), a gold standard security primitive, has extensively been studied as a privacy-preserving solution to distributed learning in non-Byzantine scenarios. However, due to HE's large computational demand especially for high-dimensional ML models, there has not yet been any attempt to design purely homomorphic operators for non-linear robust aggregators. In this work, we present SABLE, the first completely homomorphic and Byzantine robust distributed learning algorithm. SABLE essentially relies on a novel plaintext encoding method that enables us to implement the robust aggregator over batching-friendly BGV. Moreover, this encoding scheme also accelerates state-of-the-art homomorphic sorting with larger security margins and smaller ciphertext size. We perform extensive experiments on image classification tasks and show that our algorithm achieves practical execution times while matching the ML performance of its non-private counterpart.
    PostRainBench: A comprehensive benchmark and a new model for precipitation forecasting. (arXiv:2310.02676v2 [cs.LG] UPDATED)
    Accurate precipitation forecasting is a vital challenge of both scientific and societal importance. Data-driven approaches have emerged as a widely used solution for addressing this challenge. However, solely relying on data-driven approaches has limitations in modeling the underlying physics, making accurate predictions difficult. Coupling AI-based post-processing techniques with traditional Numerical Weather Prediction (NWP) methods offers a more effective solution for improving forecasting accuracy. Despite previous post-processing efforts, accurately predicting heavy rainfall remains challenging due to the imbalanced precipitation data across locations and complex relationships between multiple meteorological variables. To address these limitations, we introduce the PostRainBench, a comprehensive multi-variable NWP post-processing benchmark consisting of three datasets for NWP post-processing-based precipitation forecasting. We propose CAMT, a simple yet effective Channel Attention Enhanced Multi-task Learning framework with a specially designed weighted loss function. Its flexible design allows for easy plug-and-play integration with various backbones. Extensive experimental results on the proposed benchmark show that our method outperforms state-of-the-art methods by 6.3%, 4.7%, and 26.8% in rain CSI on the three datasets respectively. Most notably, our model is the first deep learning-based method to outperform traditional Numerical Weather Prediction (NWP) approaches in extreme precipitation conditions. It shows improvements of 15.6%, 17.4%, and 31.8% over NWP predictions in heavy rain CSI on respective datasets. These results highlight the potential impact of our model in reducing the severe consequences of extreme weather events.
    An Empirical Study of AI Generated Text Detection Tools. (arXiv:2310.01423v1 [cs.CL] CROSS LISTED)
    Since ChatGPT has emerged as a major AIGC model, providing high-quality responses across a wide range of applications (including software development and maintenance), it has attracted much interest from many individuals. ChatGPT has great promise, but there are serious problems that might arise from its misuse, especially in the realms of education and public safety. Several AIGC detectors are available, and they have all been tested on genuine text. However, more study is needed to see how effective they are for multi-domain ChatGPT material. This study aims to fill this need by creating a multi-domain dataset for testing the state-of-the-art APIs and tools for detecting artificially generated information used by universities and other research institutions. A large dataset consisting of articles, abstracts, stories, news, and product reviews was created for this study. The second step is to use the newly created dataset to put six tools through their paces. Six different artificial intelligence (AI) text identification systems, including "GPTkit," "GPTZero," "Originality," "Sapling," "Writer," and "Zylalab," have accuracy rates between 55.29 and 97.0%. Although all the tools fared well in the evaluations, originality was particularly effective across the board.
    Enhanced Human-Robot Collaboration using Constrained Probabilistic Human-Motion Prediction. (arXiv:2310.03314v1 [cs.RO])
    Human motion prediction is an essential step for efficient and safe human-robot collaboration. Current methods either purely rely on representing the human joints in some form of neural network-based architecture or use regression models offline to fit hyper-parameters in the hope of capturing a model encompassing human motion. While these methods provide good initial results, they are missing out on leveraging well-studied human body kinematic models as well as body and scene constraints which can help boost the efficacy of these prediction frameworks while also explicitly avoiding implausible human joint configurations. We propose a novel human motion prediction framework that incorporates human joint constraints and scene constraints in a Gaussian Process Regression (GPR) model to predict human motion over a set time horizon. This formulation is combined with an online context-aware constraints model to leverage task-dependent motions. It is tested on a human arm kinematic model and implemented on a human-robot collaborative setup with a UR5 robot arm to demonstrate the real-time capability of our approach. Simulations were also performed on datasets like HA4M and ANDY. The simulation and experimental results demonstrate considerable improvements in a Gaussian Process framework when these constraints are explicitly considered.
    Efficient Anatomical Labeling of Pulmonary Tree Structures via Implicit Point-Graph Networks. (arXiv:2309.17329v2 [cs.CV] UPDATED)
    Pulmonary diseases rank prominently among the principal causes of death worldwide. Curing them will require, among other things, a better understanding of the many complex 3D tree-shaped structures within the pulmonary system, such as airways, arteries, and veins. In theory, they can be modeled using high-resolution image stacks. Unfortunately, standard CNN approaches operating on dense voxel grids are prohibitively expensive. To remedy this, we introduce a point-based approach that preserves graph connectivity of tree skeleton and incorporates an implicit surface representation. It delivers SOTA accuracy at a low computational cost and the resulting models have usable surfaces. Due to the scarcity of publicly accessible data, we have also curated an extensive dataset to evaluate our approach and will make it public.
    Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed Cooperative-Competitive Games. (arXiv:2310.03354v1 [cs.AI])
    Self-play (SP) is a popular multi-agent reinforcement learning (MARL) framework for solving competitive games, where each agent optimizes policy by treating others as part of the environment. Despite the empirical successes, the theoretical properties of SP-based methods are limited to two-player zero-sum games. However, for mixed cooperative-competitive games where agents on the same team need to cooperate with each other, we can show a simple counter-example where SP-based methods cannot converge to a global Nash equilibrium (NE) with high probability. Alternatively, Policy-Space Response Oracles (PSRO) is an iterative framework for learning NE, where the best responses w.r.t. previous policies are learned in each iteration. PSRO can be directly extended to mixed cooperative-competitive settings by jointly learning team best responses with all convergence properties unchanged. However, PSRO requires repeatedly training joint policies from scratch till convergence, which makes it hard to scale to complex games. In this work, we develop a novel algorithm, Fictitious Cross-Play (FXP), which inherits the benefits from both frameworks. FXP simultaneously trains an SP-based main policy and a counter population of best response policies. The main policy is trained by fictitious self-play and cross-play against the counter population, while the counter policies are trained as the best responses to the main policy's past versions. We validate our method in matrix games and show that FXP converges to global NEs while SP methods fail. We also conduct experiments in a gridworld domain, where FXP achieves higher Elo ratings and lower exploitabilities than baselines, and a more challenging football game, where FXP defeats SOTA models with over 94% win rate.
    Machine learning the interaction network in coupled dynamical systems. (arXiv:2310.03378v1 [math.DS])
    The study of interacting dynamical systems continues to attract research interest in various fields of science and engineering. In a collection of interacting particles, the interaction network contains information about how various components interact with one another. Inferring the information about the interaction network from the dynamics of agents is a problem of long-standing interest. In this work, we employ a self-supervised neural network model to achieve two outcomes: to recover the interaction network and to predict the dynamics of individual agents. Both these information are inferred solely from the observed trajectory data. This work presents an application of the Neural Relational Inference model to two dynamical systems: coupled particles mediated by Hooke's law interaction and coupled phase (Kuramoto) oscillators.
    Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance. (arXiv:2305.20057v3 [cs.LG] UPDATED)
    Multi-objective learning (MOL) problems often arise in emerging machine learning problems when there are multiple learning criteria, data modalities, or learning tasks. Different from single-objective learning, one of the critical challenges in MOL is the potential conflict among different objectives during the iterative optimization process. Recent works have developed various dynamic weighting algorithms for MOL such as MGDA and its variants, where the central idea is to find an update direction that avoids conflicts among objectives. Albeit its appealing intuition, empirical studies show that dynamic weighting methods may not always outperform static ones. To understand this theory-practical gap, we focus on a new stochastic variant of MGDA - the Multi-objective gradient with Double sampling (MoDo) algorithm, and study the generalization performance of the dynamic weighting-based MoDo and its interplay with optimization through the lens of algorithm stability. Perhaps surprisingly, we find that the key rationale behind MGDA -- updating along conflict-avoidant direction - may hinder dynamic weighting algorithms from achieving the optimal ${\cal O}(1/\sqrt{n})$ population risk, where $n$ is the number of training samples. We further demonstrate the impact of the variability of dynamic weights on the three-way trade-off among optimization, generalization, and conflict avoidance that is unique in MOL. We showcase the generality of our theoretical framework by analyzing other existing stochastic MOL algorithms under the framework. Experiments on various multi-task learning benchmarks are performed to demonstrate the practical applicability. Code is available at https://github.com/heshandevaka/Trade-Off-MOL.
    CLEVRER-Humans: Describing Physical and Causal Events the Human Way. (arXiv:2310.03635v1 [cs.AI])
    Building machines that can reason about physical events and their causal relationships is crucial for flexible interaction with the physical world. However, most existing physical and causal reasoning benchmarks are exclusively based on synthetically generated events and synthetic natural language descriptions of causal relationships. This design brings up two issues. First, there is a lack of diversity in both event types and natural language descriptions; second, causal relationships based on manually-defined heuristics are different from human judgments. To address both shortcomings, we present the CLEVRER-Humans benchmark, a video reasoning dataset for causal judgment of physical events with human labels. We employ two techniques to improve data collection efficiency: first, a novel iterative event cloze task to elicit a new representation of events in videos, which we term Causal Event Graphs (CEGs); second, a data augmentation technique based on neural language generative models. We convert the collected CEGs into questions and answers to be consistent with prior work. Finally, we study a collection of baseline approaches for CLEVRER-Humans question-answering, highlighting the great challenges set forth by our benchmark.
    SqueezeLLM: Dense-and-Sparse Quantization. (arXiv:2306.07629v2 [cs.CL] UPDATED)
    Generative Large Language Models (LLMs) have demonstrated remarkable results for a wide range of tasks. However, deploying these models for inference has been a significant challenge due to their unprecedented resource requirements. This has forced existing deployment frameworks to use multi-GPU inference pipelines, which are often complex and costly, or to use smaller and less performant models. In this work, we demonstrate that the main bottleneck for generative inference with LLMs is memory bandwidth, rather than compute, specifically for single batch inference. While quantization has emerged as a promising solution by representing model weights with reduced precision, previous efforts have often resulted in notable performance degradation. To address this, we introduce SqueezeLLM, a post-training quantization framework that not only enables lossless compression to ultra-low precisions of up to 3-bit, but also achieves higher quantization performance under the same memory constraint. Our framework incorporates two novel ideas: (i) sensitivity-based non-uniform quantization, which searches for the optimal bit precision assignment based on second-order information; and (ii) the Dense-and-Sparse decomposition that stores outliers and sensitive weight values in an efficient sparse format. When applied to the LLaMA models, our 3-bit quantization significantly reduces the perplexity gap from the FP16 baseline by up to 2.1x as compared to the state-of-the-art methods with the same memory requirement. Furthermore, when deployed on an A6000 GPU, our quantized models achieve up to 2.3x speedup compared to the baseline. Our code is open-sourced and available online.
    An Integrated Algorithm for Robust and Imperceptible Audio Adversarial Examples. (arXiv:2310.03349v1 [cs.SD])
    Audio adversarial examples are audio files that have been manipulated to fool an automatic speech recognition (ASR) system, while still sounding benign to a human listener. Most methods to generate such samples are based on a two-step algorithm: first, a viable adversarial audio file is produced, then, this is fine-tuned with respect to perceptibility and robustness. In this work, we present an integrated algorithm that uses psychoacoustic models and room impulse responses (RIR) in the generation step. The RIRs are dynamically created by a neural network during the generation process to simulate a physical environment to harden our examples against transformations experienced in over-the-air attacks. We compare the different approaches in three experiments: in a simulated environment and in a realistic over-the-air scenario to evaluate the robustness, and in a human study to evaluate the perceptibility. Our algorithms considering psychoacoustics only or in addition to the robustness show an improvement in the signal-to-noise ratio (SNR) as well as in the human perception study, at the cost of an increased word error rate (WER).
    Towards practical reinforcement learning for tokamak magnetic control. (arXiv:2307.11546v2 [physics.plasm-ph] UPDATED)
    Reinforcement learning (RL) has shown promising results for real-time control systems, including the domain of plasma magnetic control. However, there are still significant drawbacks compared to traditional feedback control approaches for magnetic confinement. In this work, we address key drawbacks of the RL method; achieving higher control accuracy for desired plasma properties, reducing the steady-state error, and decreasing the required time to learn new tasks. We build on top of \cite{degrave2022magnetic}, and present algorithmic improvements to the agent architecture and training procedure. We present simulation results that show up to 65\% improvement in shape accuracy, achieve substantial reduction in the long-term bias of the plasma current, and additionally reduce the training time required to learn new tasks by a factor of 3 or more. We present new experiments using the upgraded RL-based controllers on the TCV tokamak, which validate the simulation results achieved, and point the way towards routinely achieving accurate discharges using the RL approach.
    Deep Geometric Learning with Monotonicity Constraints for Alzheimer's Disease Progression. (arXiv:2310.03353v1 [cs.AI])
    Alzheimer's disease (AD) is a devastating neurodegenerative condition that precedes progressive and irreversible dementia; thus, predicting its progression over time is vital for clinical diagnosis and treatment. Numerous studies have implemented structural magnetic resonance imaging (MRI) to model AD progression, focusing on three integral aspects: (i) temporal variability, (ii) incomplete observations, and (iii) temporal geometric characteristics. However, deep learning-based approaches regarding data variability and sparsity have yet to consider inherent geometrical properties sufficiently. The ordinary differential equation-based geometric modeling method (ODE-RGRU) has recently emerged as a promising strategy for modeling time-series data by intertwining a recurrent neural network and an ODE in Riemannian space. Despite its achievements, ODE-RGRU encounters limitations when extrapolating positive definite symmetric metrics from incomplete samples, leading to feature reverse occurrences that are particularly problematic, especially within the clinical facet. Therefore, this study proposes a novel geometric learning approach that models longitudinal MRI biomarkers and cognitive scores by combining three modules: topological space shift, ODE-RGRU, and trajectory estimation. We have also developed a training algorithm that integrates manifold mapping with monotonicity constraints to reflect measurement transition irreversibility. We verify our proposed method's efficacy by predicting clinical labels and cognitive scores over time in regular and irregular settings. Furthermore, we thoroughly analyze our proposed framework through an ablation study.
    Swin-Tempo: Temporal-Aware Lung Nodule Detection in CT Scans as Video Sequences Using Swin Transformer-Enhanced UNet. (arXiv:2310.03365v1 [eess.IV])
    Lung cancer is highly lethal, emphasizing the critical need for early detection. However, identifying lung nodules poses significant challenges for radiologists, who rely heavily on their expertise and experience for accurate diagnosis. To address this issue, computer-aided diagnosis systems based on machine learning techniques have emerged to assist doctors in identifying lung nodules from computed tomography (CT) scans. Unfortunately, existing networks in this domain often suffer from computational complexity, leading to high rates of false negatives and false positives, limiting their effectiveness. To address these challenges, we present an innovative model that harnesses the strengths of both convolutional neural networks and vision transformers. Inspired by object detection in videos, we treat each 3D CT image as a video, individual slices as frames, and lung nodules as objects, enabling a time-series application. The primary objective of our work is to overcome hardware limitations during model training, allowing for efficient processing of 2D data while utilizing inter-slice information for accurate identification based on 3D image context. We validated the proposed network by applying a 10-fold cross-validation technique to the publicly available Lung Nodule Analysis 2016 dataset. Our proposed architecture achieves an average sensitivity criterion of 97.84% and a competition performance metrics (CPM) of 96.0% with few parameters. Comparative analysis with state-of-the-art advancements in lung nodule identification demonstrates the significant accuracy achieved by our proposed model.
    Paying Attention to Astronomical Transients: Introducing the Time-series Transformer for Photometric Classification. (arXiv:2105.06178v3 [astro-ph.IM] UPDATED)
    Future surveys such as the Legacy Survey of Space and Time (LSST) of the Vera C. Rubin Observatory will observe an order of magnitude more astrophysical transient events than any previous survey before. With this deluge of photometric data, it will be impossible for all such events to be classified by humans alone. Recent efforts have sought to leverage machine learning methods to tackle the challenge of astronomical transient classification, with ever improving success. Transformers are a recently developed deep learning architecture, first proposed for natural language processing, that have shown a great deal of recent success. In this work we develop a new transformer architecture, which uses multi-head self attention at its core, for general multi-variate time-series data. Furthermore, the proposed time-series transformer architecture supports the inclusion of an arbitrary number of additional features, while also offering interpretability. We apply the time-series transformer to the task of photometric classification, minimising the reliance of expert domain knowledge for feature selection, while achieving results comparable to state-of-the-art photometric classification methods. We achieve a logarithmic-loss of 0.507 on imbalanced data in a representative setting using data from the Photometric LSST Astronomical Time-Series Classification Challenge (PLAsTiCC). Moreover, we achieve a micro-averaged receiver operating characteristic area under curve of 0.98 and micro-averaged precision-recall area under curve of 0.87.
    DeepHGCN: Toward Deeper Hyperbolic Graph Convolutional Networks. (arXiv:2310.02027v2 [cs.LG] UPDATED)
    Hyperbolic graph convolutional networks (HGCN) have demonstrated significant potential in extracting information from hierarchical graphs. However, existing HGCNs are limited to shallow architectures, due to the expensive hyperbolic operations and the over-smoothing issue as depth increases. Although in GCNs, treatments have been applied to alleviate over-smoothing, developing a hyperbolic therapy presents distinct challenges since operations should be carefully designed to fit the hyperbolic nature. Addressing the above challenges, in this work, we propose DeepHGCN, the first deep multi-layer HGCN architecture with dramatically improved computational efficiency and substantially alleviated over-smoothing effect. DeepHGCN presents two key enablers of deep HGCNs: (1) a novel hyperbolic feature transformation layer that enables fast and accurate linear maps; and (2) Techniques such as hyperbolic residual connections and regularization for both weights and features facilitated by an efficient hyperbolic midpoint method. Extensive experiments demonstrate that DeepHGCN obtains significant improvements in link prediction and node classification tasks compared to both Euclidean and shallow hyperbolic GCN variants.
    Rethinking Fairness for Human-AI Collaboration. (arXiv:2310.03647v1 [cs.LG])
    Existing approaches to algorithmic fairness aim to ensure equitable outcomes if human decision-makers comply perfectly with algorithmic decisions. However, perfect compliance with the algorithm is rarely a reality or even a desirable outcome in human-AI collaboration. Yet, recent studies have shown that selective compliance with fair algorithms can amplify discrimination relative to the prior human policy. As a consequence, ensuring equitable outcomes requires fundamentally different algorithmic design principles that ensure robustness to the decision-maker's (a priori unknown) compliance pattern. We define the notion of compliance-robustly fair algorithmic recommendations that are guaranteed to (weakly) improve fairness in decisions, regardless of the human's compliance pattern. We propose a simple optimization strategy to identify the best performance-improving compliance-robustly fair policy. However, we show that it may be infeasible to design algorithmic recommendations that are simultaneously fair in isolation, compliance-robustly fair, and more accurate than the human policy; thus, if our goal is to improve the equity and accuracy of human-AI collaboration, it may not be desirable to enforce traditional fairness constraints.  ( 2 min )
    Modularizing while Training: A New Paradigm for Modularizing DNN Models. (arXiv:2306.09376v3 [cs.LG] UPDATED)
    Deep neural network (DNN) models have become increasingly crucial components in intelligent software systems. However, training a DNN model is typically expensive in terms of both time and money. To address this issue, researchers have recently focused on reusing existing DNN models - borrowing the idea of code reuse in software engineering. However, reusing an entire model could cause extra overhead or inherits the weakness from the undesired functionalities. Hence, existing work proposes to decompose an already trained model into modules, i.e., modularizing-after-training, and enable module reuse. Since trained models are not built for modularization, modularizing-after-training incurs huge overhead and model accuracy loss. In this paper, we propose a novel approach that incorporates modularization into the model training process, i.e., modularizing-while-training (MwT). We train a model to be structurally modular through two loss functions that optimize intra-module cohesion and inter-module coupling. We have implemented the proposed approach for modularizing Convolutional Neural Network (CNN) models in this work. The evaluation results on representative models demonstrate that MwT outperforms the state-of-the-art approach. Specifically, the accuracy loss caused by MwT is only 1.13 percentage points, which is 1.76 percentage points less than that of the baseline. The kernel retention rate of the modules generated by MwT is only 14.58%, with a reduction of 74.31% over the state-of-the-art approach. Furthermore, the total time cost required for training and modularizing is only 108 minutes, half of the baseline.
    Diffeomorphic Multi-Resolution Deep Learning Registration for Applications in Breast MRI. (arXiv:2309.13777v2 [eess.IV] UPDATED)
    In breast surgical planning, accurate registration of MR images across patient positions has the potential to improve the localisation of tumours during breast cancer treatment. While learning-based registration methods have recently become the state-of-the-art approach for most medical image registration tasks, these methods have yet to make inroads into breast image registration due to certain difficulties-the lack of rich texture information in breast MR images and the need for the deformations to be diffeomophic. In this work, we propose learning strategies for breast MR image registration that are amenable to diffeomorphic constraints, together with early experimental results from in-silico and in-vivo experiments. One key contribution of this work is a registration network which produces superior registration outcomes for breast images in addition to providing diffeomorphic guarantees.
    Learning to Simplify Spatial-Temporal Graphs in Gait Analysis. (arXiv:2310.03396v1 [cs.CV])
    Gait analysis leverages unique walking patterns for person identification and assessment across multiple domains. Among the methods used for gait analysis, skeleton-based approaches have shown promise due to their robust and interpretable features. However, these methods often rely on hand-crafted spatial-temporal graphs that are based on human anatomy disregarding the particularities of the dataset and task. This paper proposes a novel method to simplify the spatial-temporal graph representation for gait-based gender estimation, improving interpretability without losing performance. Our approach employs two models, an upstream and a downstream model, that can adjust the adjacency matrix for each walking instance, thereby removing the fixed nature of the graph. By employing the Straight-Through Gumbel-Softmax trick, our model is trainable end-to-end. We demonstrate the effectiveness of our approach on the CASIA-B dataset for gait-based gender estimation. The resulting graphs are interpretable and differ qualitatively from fixed graphs used in existing models. Our research contributes to enhancing the explainability and task-specific adaptability of gait recognition, promoting more efficient and reliable gait-based biometrics.
    Bridging the Gap Between Foundation Models and Heterogeneous Federated Learning. (arXiv:2310.00247v2 [cs.LG] UPDATED)
    Federated learning (FL) offers privacy-preserving decentralized machine learning, optimizing models at edge clients without sharing private data. Simultaneously, foundation models (FMs) have gained traction in the artificial intelligence (AI) community due to their exceptional performance across various tasks. However, integrating FMs into FL presents challenges, primarily due to their substantial size and intensive resource requirements. This is especially true when considering the resource heterogeneity in edge FL systems. We present an adaptive framework for Resource-aware Federated Foundation Models (RaFFM) to address these challenges. RaFFM introduces specialized model compression algorithms tailored for FL scenarios, such as salient parameter prioritization and high-performance subnetwork extraction. These algorithms enable dynamic scaling of given transformer-based FMs to fit heterogeneous resource constraints at the network edge during both FL's optimization and deployment stages. Experimental results demonstrate that RaFFM shows significant superiority in resource utilization efficiency and uses fewer resources to deploy FMs to FL. Despite the lower resource consumption, target models optimized by RaFFM achieve performance on par with traditional FL methods applied to full-sized FMs. This is evident across tasks in both natural language processing and computer vision domains.
    Towards Optimal Neural Networks: the Role of Sample Splitting in Hyperparameter Selection. (arXiv:2307.07726v2 [stat.ML] UPDATED)
    When artificial neural networks have demonstrated exceptional practical success in a variety of domains, investigations into their theoretical characteristics, such as their approximation power, statistical properties, and generalization performance, have concurrently made significant strides. In this paper, we construct a novel theory for understanding the effectiveness of neural networks, which offers a perspective distinct from prior research. Specifically, we explore the rationale underlying a common practice during the construction of neural network models: sample splitting. Our findings indicate that the optimal hyperparameters derived from sample splitting can enable a neural network model that asymptotically minimizes the prediction risk. We conduct extensive experiments across different application scenarios and network architectures, and the results manifest our theory's effectiveness.
    Mechanic Maker 2.0: Reinforcement Learning for Evaluating Generated Rules. (arXiv:2309.09476v3 [cs.AI] UPDATED)
    Automated game design (AGD), the study of automatically generating game rules, has a long history in technical games research. AGD approaches generally rely on approximations of human play, either objective functions or AI agents. Despite this, the majority of these approximators are static, meaning they do not reflect human player's ability to learn and improve in a game. In this paper, we investigate the application of Reinforcement Learning (RL) as an approximator for human play for rule generation. We recreate the classic AGD environment Mechanic Maker in Unity as a new, open-source rule generation framework. Our results demonstrate that RL produces distinct sets of rules from an A* agent baseline, which may be more usable by humans.
    TRAM: Bridging Trust Regions and Sharpness Aware Minimization. (arXiv:2310.03646v1 [cs.LG])
    By reducing the curvature of the loss surface in the parameter space, Sharpness-aware minimization (SAM) yields widespread robustness improvement under domain transfer. Instead of focusing on parameters, however, this work considers the transferability of representations as the optimization target for out-of-domain generalization in a fine-tuning setup. To encourage the retention of transferable representations, we consider trust region-based fine-tuning methods, which exploit task-specific skills without forgetting task-agnostic representations from pre-training. We unify parameter- and representation-space smoothing approaches by using trust region bounds to inform SAM-style regularizers on both of these optimization surfaces. We propose Trust Region Aware Minimization (TRAM), a fine-tuning algorithm that optimizes for flat minima and smooth, informative representations without forgetting pre-trained structure. We find that TRAM outperforms both sharpness-aware and trust region-based optimization methods on cross-domain language modeling and cross-lingual transfer, where robustness to domain transfer and representation generality are critical for success. TRAM establishes a new standard in training generalizable models with minimal additional computation.
    Neural Operators for Accelerating Scientific Simulations and Design. (arXiv:2309.15325v2 [cs.LG] UPDATED)
    Scientific discovery and engineering design are currently limited by the time and cost of physical experiments, selected mostly through trial-and-error and intuition that require deep domain expertise. Numerical simulations present an alternative to physical experiments but are usually infeasible for complex real-world domains due to the computational requirements of existing numerical methods. Artificial intelligence (AI) presents a potential paradigm shift by developing fast data-driven surrogate models. In particular, an AI framework, known as neural operators, presents a principled framework for learning mappings between functions defined on continuous domains, e.g., spatiotemporal processes and partial differential equations (PDE). They can extrapolate and predict solutions at new locations unseen during training, i.e., perform zero-shot super-resolution. Neural operators can augment or even replace existing simulators in many applications, such as computational fluid dynamics, weather forecasting, and material modeling, while being 4-5 orders of magnitude faster. Further, neural operators can be integrated with physics and other domain constraints enforced at finer resolutions to obtain high-fidelity solutions and good generalization. Since neural operators are differentiable, they can directly optimize parameters for inverse design and other inverse problems. We believe that neural operators present a transformative approach to simulation and design, enabling rapid research and development.
    Ablation Study to Clarify the Mechanism of Object Segmentation in Multi-Object Representation Learning. (arXiv:2310.03273v1 [cs.CV])
    Multi-object representation learning aims to represent complex real-world visual input using the composition of multiple objects. Representation learning methods have often used unsupervised learning to segment an input image into individual objects and encode these objects into each latent vector. However, it is not clear how previous methods have achieved the appropriate segmentation of individual objects. Additionally, most of the previous methods regularize the latent vectors using a Variational Autoencoder (VAE). Therefore, it is not clear whether VAE regularization contributes to appropriate object segmentation. To elucidate the mechanism of object segmentation in multi-object representation learning, we conducted an ablation study on MONet, which is a typical method. MONet represents multiple objects using pairs that consist of an attention mask and the latent vector corresponding to the attention mask. Each latent vector is encoded from the input image and attention mask. Then, the component image and attention mask are decoded from each latent vector. The loss function of MONet consists of 1) the sum of reconstruction losses between the input image and decoded component image, 2) the VAE regularization loss of the latent vector, and 3) the reconstruction loss of the attention mask to explicitly encode shape information. We conducted an ablation study on these three loss functions to investigate the effect on segmentation performance. Our results showed that the VAE regularization loss did not affect segmentation performance and the others losses did affect it. Based on this result, we hypothesize that it is important to maximize the attention mask of the image region best represented by a single latent vector corresponding to the attention mask. We confirmed this hypothesis by evaluating a new loss function with the same mechanism as the hypothesis.
    FedJETs: Efficient Just-In-Time Personalization with Federated Mixture of Experts. (arXiv:2306.08586v2 [cs.LG] UPDATED)
    One of the goals in Federated Learning (FL) is to create personalized models that can adapt to the context of each participating client, while utilizing knowledge from a shared global model. Yet, often, personalization requires a fine-tuning step using clients' labeled data in order to achieve good performance. This may not be feasible in scenarios where incoming clients are fresh and/or have privacy concerns. It, then, remains open how one can achieve just-in-time personalization in these scenarios. We propose FedJETs, a novel solution by using a Mixture-of-Experts (MoE) framework within a FL setup. Our method leverages the diversity of the clients to train specialized experts on different subsets of classes, and a gating function to route the input to the most relevant expert(s). Our gating function harnesses the knowledge of a pretrained model common expert to enhance its routing decisions on-the-fly. As a highlight, our approach can improve accuracy up to 18\% in state of the art FL settings, while maintaining competitive zero-shot performance. In practice, our method can handle non-homogeneous data distributions, scale more efficiently, and improve the state-of-the-art performance on common FL benchmarks.
    BioBridge: Bridging Biomedical Foundation Models via Knowledge Graph. (arXiv:2310.03320v1 [cs.LG])
    Foundation models (FMs) are able to leverage large volumes of unlabeled data to demonstrate superior performance across a wide range of tasks. However, FMs developed for biomedical domains have largely remained unimodal, i.e., independently trained and used for tasks on protein sequences alone, small molecule structures alone, or clinical data alone. To overcome this limitation of biomedical FMs, we present BioBridge, a novel parameter-efficient learning framework, to bridge independently trained unimodal FMs to establish multimodal behavior. BioBridge achieves it by utilizing Knowledge Graphs (KG) to learn transformations between one unimodal FM and another without fine-tuning any underlying unimodal FMs. Our empirical results demonstrate that BioBridge can beat the best baseline KG embedding methods (on average by around 76.3%) in cross-modal retrieval tasks. We also identify BioBridge demonstrates out-of-domain generalization ability by extrapolating to unseen modalities or relations. Additionally, we also show that BioBridge presents itself as a general purpose retriever that can aid biomedical multimodal question answering as well as enhance the guided generation of novel drugs.
    Targeted Adversarial Attacks on Generalizable Neural Radiance Fields. (arXiv:2310.03578v1 [cs.LG])
    Neural Radiance Fields (NeRFs) have recently emerged as a powerful tool for 3D scene representation and rendering. These data-driven models can learn to synthesize high-quality images from sparse 2D observations, enabling realistic and interactive scene reconstructions. However, the growing usage of NeRFs in critical applications such as augmented reality, robotics, and virtual environments could be threatened by adversarial attacks. In this paper we present how generalizable NeRFs can be attacked by both low-intensity adversarial attacks and adversarial patches, where the later could be robust enough to be used in real world applications. We also demonstrate targeted attacks, where a specific, predefined output scene is generated by these attack with success.
    Self-supervised Deep Unrolled Reconstruction Using Regularization by Denoising. (arXiv:2205.03519v3 [eess.IV] UPDATED)
    Deep learning methods have been successfully used in various computer vision tasks. Inspired by that success, deep learning has been explored in magnetic resonance imaging (MRI) reconstruction. In particular, integrating deep learning and model-based optimization methods has shown considerable advantages. However, a large amount of labeled training data is typically needed for high reconstruction quality, which is challenging for some MRI applications. In this paper, we propose a novel reconstruction method, named DURED-Net, that enables interpretable self-supervised learning for MR image reconstruction by combining a self-supervised denoising network and a plug-and-play method. We aim to boost the reconstruction performance of Noise2Noise in MR reconstruction by adding an explicit prior that utilizes imaging physics. Specifically, the leverage of a denoising network for MRI reconstruction is achieved using Regularization by Denoising (RED). Experiment results demonstrate that the proposed method requires a reduced amount of training data to achieve high reconstruction quality among the state-of-art of MR reconstruction utilizing the Noise2Noise method.
    EAG-RS: A Novel Explainability-guided ROI-Selection Framework for ASD Diagnosis via Inter-regional Relation Learning. (arXiv:2310.03404v1 [cs.LG])
    Deep learning models based on resting-state functional magnetic resonance imaging (rs-fMRI) have been widely used to diagnose brain diseases, particularly autism spectrum disorder (ASD). Existing studies have leveraged the functional connectivity (FC) of rs-fMRI, achieving notable classification performance. However, they have significant limitations, including the lack of adequate information while using linear low-order FC as inputs to the model, not considering individual characteristics (i.e., different symptoms or varying stages of severity) among patients with ASD, and the non-explainability of the decision process. To cover these limitations, we propose a novel explainability-guided region of interest (ROI) selection (EAG-RS) framework that identifies non-linear high-order functional associations among brain regions by leveraging an explainable artificial intelligence technique and selects class-discriminative regions for brain disease identification. The proposed framework includes three steps: (i) inter-regional relation learning to estimate non-linear relations through random seed-based network masking, (ii) explainable connection-wise relevance score estimation to explore high-order relations between functional connections, and (iii) non-linear high-order FC-based diagnosis-informative ROI selection and classifier learning to identify ASD. We validated the effectiveness of our proposed method by conducting experiments using the Autism Brain Imaging Database Exchange (ABIDE) dataset, demonstrating that the proposed method outperforms other comparative methods in terms of various evaluation metrics. Furthermore, we qualitatively analyzed the selected ROIs and identified ASD subtypes linked to previous neuroscientific studies.
    On the Implicit Bias of Adam. (arXiv:2309.00079v3 [cs.LG] UPDATED)
    In previous literature, backward error analysis was used to find ordinary differential equations (ODEs) approximating the gradient descent trajectory. It was found that finite step sizes implicitly regularize solutions because terms appearing in the ODEs penalize the two-norm of the loss gradients. We prove that the existence of similar implicit regularization in RMSProp and Adam depends on their hyperparameters and the training stage, but with a different "norm" involved: the corresponding ODE terms either penalize the (perturbed) one-norm of the loss gradients or, on the contrary, hinder its decrease (the latter case being typical). We also conduct numerical experiments and discuss how the proven facts can influence generalization.
    SFUSNet: A Spatial-Frequency domain-based Multi-branch Network for diagnosis of Cervical Lymph Node Lesions in Ultrasound Images. (arXiv:2308.16738v2 [eess.IV] UPDATED)
    Booming deep learning has substantially improved the diagnosis for diverse lesions in ultrasound images, but a conspicuous research gap concerning cervical lymph node lesions still remains. The objective of this work is to diagnose cervical lymph node lesions in ultrasound images by leveraging a deep learning model. To this end, we first collected 3392 cervical ultrasound images containing normal lymph nodes, benign lymph node lesions, malignant primary lymph node lesions, and malignant metastatic lymph node lesions. Given that ultrasound images are generated by the reflection and scattering of sound waves across varied bodily tissues, we proposed the Conv-FFT Block. It integrates convolutional operations with the fast Fourier transform to more astutely model the images. Building upon this foundation, we designed a novel architecture, named SFUSNet. SFUSNet not only discerns variances in ultrasound images from the spatial domain but also adeptly captures micro-structural alterations across various lesions in the frequency domain. To ascertain the potential of SFUSNet, we benchmarked it against 12 popular architectures through five-fold cross-validation. The results show that SFUSNet is the state-of-the-art model and can achieve 92.89% accuracy. Moreover, its average precision, average sensitivity and average specificity for four types of lesions achieve 90.46%, 89.95% and 97.49%, respectively.
    Agent Instructs Large Language Models to be General Zero-Shot Reasoners. (arXiv:2310.03710v1 [cs.CL])
    We introduce a method to improve the zero-shot reasoning abilities of large language models on general language understanding tasks. Specifically, we build an autonomous agent to instruct the reasoning process of large language models. We show this approach further unleashes the zero-shot reasoning abilities of large language models to more tasks. We study the performance of our method on a wide set of datasets spanning generation, classification, and reasoning. We show that our method generalizes to most tasks and obtains state-of-the-art zero-shot performance on 20 of the 29 datasets that we evaluate. For instance, our method boosts the performance of state-of-the-art large language models by a large margin, including Vicuna-13b (13.3%), Llama-2-70b-chat (23.2%), and GPT-3.5 Turbo (17.0%). Compared to zero-shot chain of thought, our improvement in reasoning is striking, with an average increase of 10.5%. With our method, Llama-2-70b-chat outperforms zero-shot GPT-3.5 Turbo by 10.2%.
    DyVal: Graph-informed Dynamic Evaluation of Large Language Models. (arXiv:2309.17167v2 [cs.AI] UPDATED)
    Large language models (LLMs) have achieved remarkable performance in various evaluation benchmarks. However, concerns about their performance are raised on potential data contamination in their considerable volume of training corpus. Moreover, the static nature and fixed complexity of current benchmarks may inadequately gauge the advancing capabilities of LLMs. In this paper, we introduce DyVal, a novel, general, and flexible evaluation protocol for dynamic evaluation of LLMs. Based on our proposed dynamic evaluation framework, we build graph-informed DyVal by leveraging the structural advantage of directed acyclic graphs to dynamically generate evaluation samples with controllable complexities. DyVal generates challenging evaluation sets on reasoning tasks including mathematics, logical reasoning, and algorithm problems. We evaluate various LLMs ranging from Flan-T5-large to ChatGPT and GPT4. Experiments demonstrate that LLMs perform worse in DyVal-generated evaluation samples with different complexities, emphasizing the significance of dynamic evaluation. We also analyze the failure cases and results of different prompting methods. Moreover, DyVal-generated samples are not only evaluation sets, but also helpful data for fine-tuning to improve the performance of LLMs on existing benchmarks. We hope that DyVal can shed light on the future evaluation research of LLMs.
    Quantitative CLTs in Deep Neural Networks. (arXiv:2307.06092v4 [cs.LG] UPDATED)
    We study the distribution of a fully connected neural network with random Gaussian weights and biases in which the hidden layer widths are proportional to a large constant $n$. Under mild assumptions on the non-linearity, we obtain quantitative bounds on normal approximations valid at large but finite $n$ and any fixed network depth. Our theorems show both for the finite-dimensional distributions and the entire process, that the distance between a random fully connected network (and its derivatives) to the corresponding infinite width Gaussian process scales like $n^{-\gamma}$ for $\gamma>0$, with the exponent depending on the metric used to measure discrepancy. Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature; in the one-dimensional case, we also prove that they are optimal, i.e., we establish matching lower bounds.
    DISCO-10M: A Large-Scale Music Dataset. (arXiv:2306.13512v2 [cs.SD] UPDATED)
    Music datasets play a crucial role in advancing research in machine learning for music. However, existing music datasets suffer from limited size, accessibility, and lack of audio resources. To address these shortcomings, we present DISCO-10M, a novel and extensive music dataset that surpasses the largest previously available music dataset by an order of magnitude. To ensure high-quality data, we implement a multi-stage filtering process. This process incorporates similarities based on textual descriptions and audio embeddings. Moreover, we provide precomputed CLAP embeddings alongside DISCO-10M, facilitating direct application on various downstream tasks. These embeddings enable efficient exploration of machine learning applications on the provided data. With DISCO-10M, we aim to democratize and facilitate new research to help advance the development of novel machine learning models for music.
    SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning. (arXiv:2308.00436v3 [cs.AI] UPDATED)
    The recent progress in large language models (LLMs), especially the invention of chain-of-thought prompting, has made it possible to automatically answer questions by stepwise reasoning. However, when faced with more complicated problems that require non-linear thinking, even the strongest LLMs make mistakes. To address this, we explore whether LLMs are able to recognize errors in their own step-by-step reasoning, without resorting to external resources. To this end, we propose SelfCheck, a general-purpose zero-shot verification schema for recognizing such errors. We then use the results of these checks to improve question-answering performance by conducting weighted voting on multiple solutions to the question. We test SelfCheck on three datasets (GSM8K, MathQA, and MATH) and find that it successfully recognizes errors and, in turn, increases final answer accuracies.
    Solving a Class of Non-Convex Minimax Optimization in Federated Learning. (arXiv:2310.03613v1 [cs.LG])
    The minimax problems arise throughout machine learning applications, ranging from adversarial training and policy evaluation in reinforcement learning to AUROC maximization. To address the large-scale data challenges across multiple clients with communication-efficient distributed training, federated learning (FL) is gaining popularity. Many optimization algorithms for minimax problems have been developed in the centralized setting (\emph{i.e.} single-machine). Nonetheless, the algorithm for minimax problems under FL is still underexplored. In this paper, we study a class of federated nonconvex minimax optimization problems. We propose FL algorithms (FedSGDA+ and FedSGDA-M) and reduce existing complexity results for the most common minimax problems. For nonconvex-concave problems, we propose FedSGDA+ and reduce the communication complexity to $O(\varepsilon^{-6})$. Under nonconvex-strongly-concave and nonconvex-PL minimax settings, we prove that FedSGDA-M has the best-known sample complexity of $O(\kappa^{3} N^{-1}\varepsilon^{-3})$ and the best-known communication complexity of $O(\kappa^{2}\varepsilon^{-2})$. FedSGDA-M is the first algorithm to match the best sample complexity $O(\varepsilon^{-3})$ achieved by the single-machine method under the nonconvex-strongly-concave setting. Extensive experimental results on fair classification and AUROC maximization show the efficiency of our algorithms.
    Multimarginal generative modeling with stochastic interpolants. (arXiv:2310.03695v1 [cs.LG])
    Given a set of $K$ probability densities, we consider the multimarginal generative modeling problem of learning a joint distribution that recovers these densities as marginals. The structure of this joint distribution should identify multi-way correspondences among the prescribed marginals. We formalize an approach to this task within a generalization of the stochastic interpolant framework, leading to efficient learning algorithms built upon dynamical transport of measure. Our generative models are defined by velocity and score fields that can be characterized as the minimizers of simple quadratic objectives, and they are defined on a simplex that generalizes the time variable in the usual dynamical transport framework. The resulting transport on the simplex is influenced by all marginals, and we show that multi-way correspondences can be extracted. The identification of such correspondences has applications to style transfer, algorithmic fairness, and data decorruption. In addition, the multimarginal perspective enables an efficient algorithm for reducing the dynamical transport cost in the ordinary two-marginal setting. We demonstrate these capacities with several numerical examples.
    Landscape-Sketch-Step: An AI/ML-Based Metaheuristic for Surrogate Optimization Problems. (arXiv:2309.07936v3 [cs.LG] UPDATED)
    In this paper, we introduce a new heuristics for global optimization in scenarios where extensive evaluations of the cost function are expensive, inaccessible, or even prohibitive. The method, which we call Landscape-Sketch-and-Step (LSS), combines Machine Learning, Stochastic Optimization, and Reinforcement Learning techniques, relying on historical information from previously sampled points to make judicious choices of parameter values where the cost function should be evaluated at. Unlike optimization by Replica Exchange Monte Carlo methods, the number of evaluations of the cost function required in this approach is comparable to that used by Simulated Annealing, quality that is especially important in contexts like high-throughput computing or high-performance computing tasks, where evaluations are either computationally expensive or take a long time to be performed. The method also differs from standard Surrogate Optimization techniques, for it does not construct a surrogate model that aims at approximating or reconstructing the objective function. We illustrate our method by applying it to low dimensional optimization problems (dimensions 1, 2, 4, and 8) that mimick known difficulties of minimization on rugged energy landscapes often seen in Condensed Matter Physics, where cost functions are rugged and plagued with local minima. When compared to classical Simulated Annealing, the LSS shows an effective acceleration of the optimization process.
    Large-scale investigation of weakly-supervised deep learning for the fine-grained semantic indexing of biomedical literature. (arXiv:2301.09350v2 [cs.CL] UPDATED)
    Objective: Semantic indexing of biomedical literature is usually done at the level of MeSH descriptors with several related but distinct biomedical concepts often grouped together and treated as a single topic. This study proposes a new method for the automated refinement of subject annotations at the level of MeSH concepts. Methods: Lacking labelled data, we rely on weak supervision based on concept occurrence in the abstract of an article, which is also enhanced by dictionary-based heuristics. In addition, we investigate deep learning approaches, making design choices to tackle the particular challenges of this task. The new method is evaluated on a large-scale retrospective scenario, based on concepts that have been promoted to descriptors. Results: In our experiments concept occurrence was the strongest heuristic achieving a macro-F1 score of about 0.63 across several labels. The proposed method improved it further by more than 4pp. Conclusion: The results suggest that concept occurrence is a strong heuristic for refining the coarse-grained labels at the level of MeSH concepts and the proposed method improves it further.
    Deep Quantum Graph Dreaming: Deciphering Neural Network Insights into Quantum Experiments. (arXiv:2309.07056v2 [quant-ph] UPDATED)
    Despite their promise to facilitate new scientific discoveries, the opaqueness of neural networks presents a challenge in interpreting the logic behind their findings. Here, we use a eXplainable-AI (XAI) technique called $inception$ or $deep$ $dreaming$, which has been invented in machine learning for computer vision. We use this technique to explore what neural networks learn about quantum optics experiments. Our story begins by training deep neural networks on the properties of quantum systems. Once trained, we "invert" the neural network -- effectively asking how it imagines a quantum system with a specific property, and how it would continuously modify the quantum system to change a property. We find that the network can shift the initial distribution of properties of the quantum system, and we can conceptualize the learned strategies of the neural network. Interestingly, we find that, in the first layers, the neural network identifies simple properties, while in the deeper ones, it can identify complex quantum structures and even quantum entanglement. This is in reminiscence of long-understood properties known in computer vision, which we now identify in a complex natural science task. Our approach could be useful in a more interpretable way to develop new advanced AI-based scientific discovery techniques in quantum physics.
    Marginalized Importance Sampling for Off-Environment Policy Evaluation. (arXiv:2309.01807v2 [cs.LG] UPDATED)
    Reinforcement Learning (RL) methods are typically sample-inefficient, making it challenging to train and deploy RL-policies in real world robots. Even a robust policy trained in simulation requires a real-world deployment to assess their performance. This paper proposes a new approach to evaluate the real-world performance of agent policies prior to deploying them in the real world. Our approach incorporates a simulator along with real-world offline data to evaluate the performance of any policy using the framework of Marginalized Importance Sampling (MIS). Existing MIS methods face two challenges: (1) large density ratios that deviate from a reasonable range and (2) indirect supervision, where the ratio needs to be inferred indirectly, thus exacerbating estimation error. Our approach addresses these challenges by introducing the target policy's occupancy in the simulator as an intermediate variable and learning the density ratio as the product of two terms that can be learned separately. The first term is learned with direct supervision and the second term has a small magnitude, thus making it computationally efficient. We analyze the sample complexity as well as error propagation of our two step-procedure. Furthermore, we empirically evaluate our approach on Sim2Sim environments such as Cartpole, Reacher, and Half-Cheetah. Our results show that our method generalizes well across a variety of Sim2Sim gap, target policies and offline data collection policies. We also demonstrate the performance of our algorithm on a Sim2Real task of validating the performance of a 7 DoF robotic arm using offline data along with the Gazebo simulator.
    IBCL: Zero-shot Model Generation for Task Trade-offs in Continual Learning. (arXiv:2310.02995v2 [cs.LG] UPDATED)
    Like generic multi-task learning, continual learning has the nature of multi-objective optimization, and therefore faces a trade-off between the performance of different tasks. That is, to optimize for the current task distribution, it may need to compromise performance on some previous tasks. This means that there exist multiple models that are Pareto-optimal at different times, each addressing a distinct task performance trade-off. Researchers have discussed how to train particular models to address specific trade-off preferences. However, existing algorithms require training overheads proportional to the number of preferences -- a large burden when there are multiple, possibly infinitely many, preferences. As a response, we propose Imprecise Bayesian Continual Learning (IBCL). Upon a new task, IBCL (1) updates a knowledge base in the form of a convex hull of model parameter distributions and (2) obtains particular models to address task trade-off preferences with zero-shot. That is, IBCL does not require any additional training overhead to generate preference-addressing models from its knowledge base. We show that models obtained by IBCL have guarantees in identifying the Pareto optimal parameters. Moreover, experiments on standard image classification and NLP tasks support this guarantee. Statistically, IBCL improves average per-task accuracy by at most 23\% and peak per-task accuracy by at most 15\% with respect to the baseline methods, with steadily near-zero or positive backward transfer. Most importantly, IBCL significantly reduces the training overhead from training 1 model per preference to at most 3 models for all preferences.
    Borges and AI. (arXiv:2310.01425v2 [cs.CL] UPDATED)
    Many believe that Large Language Models (LLMs) open the era of Artificial Intelligence (AI). Some see opportunities while others see dangers. Yet both proponents and opponents grasp AI through the imagery popularised by science fiction. Will the machine become sentient and rebel against its creators? Will we experience a paperclip apocalypse? Before answering such questions, we should first ask whether this mental imagery provides a good description of the phenomenon at hand. Understanding weather patterns through the moods of the gods only goes so far. The present paper instead advocates understanding LLMs and their connection to AI through the imagery of Jorge Luis Borges, a master of 20th century literature, forerunner of magical realism, and precursor to postmodern literature. This exercise leads to a new perspective that illuminates the relation between language modelling and artificial intelligence.
    BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection. (arXiv:2308.12439v2 [cs.CR] UPDATED)
    We present a novel defense, against backdoor attacks on Deep Neural Networks (DNNs), wherein adversaries covertly implant malicious behaviors (backdoors) into DNNs. Our defense falls within the category of post-development defenses that operate independently of how the model was generated. The proposed defense is built upon a novel reverse engineering approach that can directly extract backdoor functionality of a given backdoored model to a backdoor expert model. The approach is straightforward -- finetuning the backdoored model over a small set of intentionally mislabeled clean samples, such that it unlearns the normal functionality while still preserving the backdoor functionality, and thus resulting in a model (dubbed a backdoor expert model) that can only recognize backdoor inputs. Based on the extracted backdoor expert model, we show the feasibility of devising highly accurate backdoor input detectors that filter out the backdoor inputs during model inference. Further augmented by an ensemble strategy with a finetuned auxiliary model, our defense, BaDExpert (Backdoor Input Detection with Backdoor Expert), effectively mitigates 17 SOTA backdoor attacks while minimally impacting clean utility. The effectiveness of BaDExpert has been verified on multiple datasets (CIFAR10, GTSRB and ImageNet) across various model architectures (ResNet, VGG, MobileNetV2 and Vision Transformer).
    Formally Explaining Neural Networks within Reactive Systems. (arXiv:2308.00143v3 [cs.AI] UPDATED)
    Deep neural networks (DNNs) are increasingly being used as controllers in reactive systems. However, DNNs are highly opaque, which renders it difficult to explain and justify their actions. To mitigate this issue, there has been a surge of interest in explainable AI (XAI) techniques, capable of pinpointing the input features that caused the DNN to act as it did. Existing XAI techniques typically face two limitations: (i) they are heuristic, and do not provide formal guarantees that the explanations are correct; and (ii) they often apply to ``one-shot'' systems, where the DNN is invoked independently of past invocations, as opposed to reactive systems. Here, we begin bridging this gap, and propose a formal DNN-verification-based XAI technique for reasoning about multi-step, reactive systems. We suggest methods for efficiently calculating succinct explanations, by exploiting the system's transition constraints in order to curtail the search space explored by the underlying verifier. We evaluate our approach on two popular benchmarks from the domain of automated navigation; and observe that our methods allow the efficient computation of minimal and minimum explanations, significantly outperforming the state of the art. We also demonstrate that our methods produce formal explanations that are more reliable than competing, non-verification-based XAI techniques.
    Probabilistically Rewired Message-Passing Neural Networks. (arXiv:2310.02156v2 [cs.LG] UPDATED)
    Message-passing graph neural networks (MPNNs) emerged as powerful tools for processing graph-structured input. However, they operate on a fixed input graph structure, ignoring potential noise and missing information. Furthermore, their local aggregation mechanism can lead to problems such as over-squashing and limited expressive power in capturing relevant graph structures. Existing solutions to these challenges have primarily relied on heuristic methods, often disregarding the underlying data distribution. Hence, devising principled approaches for learning to infer graph structures relevant to the given prediction task remains an open challenge. In this work, leveraging recent progress in exact and differentiable $k$-subset sampling, we devise probabilistically rewired MPNNs (PR-MPNNs), which learn to add relevant edges while omitting less beneficial ones. For the first time, our theoretical analysis explores how PR-MPNNs enhance expressive power, and we identify precise conditions under which they outperform purely randomized approaches. Empirically, we demonstrate that our approach effectively mitigates issues like over-squashing and under-reaching. In addition, on established real-world datasets, our method exhibits competitive or superior predictive performance compared to traditional MPNN models and recent graph transformer architectures.
    Transferring Annotator- and Instance-dependent Transition Matrix for Learning from Crowds. (arXiv:2306.03116v2 [cs.HC] UPDATED)
    Learning from crowds describes that the annotations of training data are obtained with crowd-sourcing services. Multiple annotators each complete their own small part of the annotations, where labeling mistakes that depend on annotators occur frequently. Modeling the label-noise generation process by the noise transition matrix is a power tool to tackle the label noise. In real-world crowd-sourcing scenarios, noise transition matrices are both annotator- and instance-dependent. However, due to the high complexity of annotator- and instance-dependent transition matrices (AIDTM), annotation sparsity, which means each annotator only labels a little part of instances, makes modeling AIDTM very challenging. Prior works simplify the problem by assuming the transition matrix is instance-independent or using simple parametric ways, which lose modeling generality. Motivated by this, we target a more realistic problem, estimating general AIDTM in practice. Without losing modeling generality, we parameterize AIDTM with deep neural networks. To alleviate the modeling challenge, we suppose every annotator shares its noise pattern with similar annotators, and estimate AIDTM via knowledge transfer. We hence first model the mixture of noise patterns by all annotators, and then transfer this modeling to individual annotators. Furthermore, considering that the transfer from the mixture of noise patterns to individuals may cause two annotators with highly different noise generations to perturb each other, we employ the knowledge transfer between identified neighboring annotators to calibrate the modeling. Theoretical analyses are derived to demonstrate that both the knowledge transfer from global to individuals and the knowledge transfer between neighboring individuals can help model general AIDTM. Experiments confirm the superiority of the proposed approach on synthetic and real-world crowd-sourcing data.
    Burning the Adversarial Bridges: Robust Windows Malware Detection Against Binary-level Mutations. (arXiv:2310.03285v1 [cs.LG])
    Toward robust malware detection, we explore the attack surface of existing malware detection systems. We conduct root-cause analyses of the practical binary-level black-box adversarial malware examples. Additionally, we uncover the sensitivity of volatile features within the detection engines and exhibit their exploitability. Highlighting volatile information channels within the software, we introduce three software pre-processing steps to eliminate the attack surface, namely, padding removal, software stripping, and inter-section information resetting. Further, to counter the emerging section injection attacks, we propose a graph-based section-dependent information extraction scheme for software representation. The proposed scheme leverages aggregated information within various sections in the software to enable robust malware detection and mitigate adversarial settings. Our experimental results show that traditional malware detection models are ineffective against adversarial threats. However, the attack surface can be largely reduced by eliminating the volatile information. Therefore, we propose simple-yet-effective methods to mitigate the impacts of binary manipulation attacks. Overall, our graph-based malware detection scheme can accurately detect malware with an area under the curve score of 88.32\% and a score of 88.19% under a combination of binary manipulation attacks, exhibiting the efficiency of our proposed scheme.
    Multiple Case Physics-Informed Neural Network for Biomedical Tube Flows. (arXiv:2309.15294v2 [physics.flu-dyn] UPDATED)
    Fluid dynamics computations for tube-like geometries are important for biomedical evaluation of vascular and airway fluid dynamics. Physics-Informed Neural Networks (PINNs) have recently emerged as a good alternative to traditional computational fluid dynamics (CFD) methods. The vanilla PINN, however, requires much longer training time than the traditional CFD methods for each specific flow scenario and thus does not justify its mainstream use. Here, we explore the use of the multi-case PINN approach for calculating biomedical tube flows, where varied geometry cases are parameterized and pre-trained on the PINN, such that results for unseen geometries can be obtained in real time. Our objective is to identify network architecture, tube-specific, and regularization strategies that can optimize this, via experiments on a series of idealized 2D stenotic tube flows.
    Rayleigh Quotient Graph Neural Networks for Graph-level Anomaly Detection. (arXiv:2310.02861v2 [cs.LG] UPDATED)
    Graph-level anomaly detection has gained significant attention as it finds many applications in various domains, such as cancer diagnosis and enzyme prediction. However, existing methods fail to capture the underlying properties of graph anomalies, resulting in unexplainable framework design and unsatisfying performance. In this paper, we take a step back and re-investigate the spectral differences between anomalous and normal graphs. Our main observation shows a significant disparity in the accumulated spectral energy between these two classes. Moreover, we prove that the accumulated spectral energy of the graph signal can be represented by its Rayleigh Quotient, indicating that the Rayleigh Quotient is a driving factor behind the anomalous properties of graphs. Motivated by this, we propose Rayleigh Quotient Graph Neural Network (RQGNN), the first spectral GNN for graph-level anomaly detection, providing a new perspective on exploring the inherent spectral features of anomalous graphs. Specifically, we introduce a novel framework that consists of two components: the Rayleigh Quotient learning component (RQL) and Chebyshev Wavelet GNN with RQ-pooling (CWGNN-RQ). RQL explicitly captures the Rayleigh Quotient of graphs and CWGNN-RQ implicitly explores the spectral space of graphs. Extensive experiments on 10 real-world datasets show that RQGNN outperforms the best rival by 6.74% in Macro-F1 score and 1.44% in AUC, demonstrating the effectiveness of our framework.
    PINNacle: A Comprehensive Benchmark of Physics-Informed Neural Networks for Solving PDEs. (arXiv:2306.08827v2 [cs.LG] UPDATED)
    While significant progress has been made on Physics-Informed Neural Networks (PINNs), a comprehensive comparison of these methods across a wide range of Partial Differential Equations (PDEs) is still lacking. This study introduces PINNacle, a benchmarking tool designed to fill this gap. PINNacle provides a diverse dataset, comprising over 20 distinct PDEs from various domains, including heat conduction, fluid dynamics, biology, and electromagnetics. These PDEs encapsulate key challenges inherent to real-world problems, such as complex geometry, multi-scale phenomena, nonlinearity, and high dimensionality. PINNacle also offers a user-friendly toolbox, incorporating about 10 state-of-the-art PINN methods for systematic evaluation and comparison. We have conducted extensive experiments with these methods, offering insights into their strengths and weaknesses. In addition to providing a standardized means of assessing performance, PINNacle also offers an in-depth analysis to guide future research, particularly in areas such as domain decomposition methods and loss reweighting for handling multi-scale problems and complex geometry. To the best of our knowledge, it is the largest benchmark with a diverse and comprehensive evaluation that will undoubtedly foster further research in PINNs.
    Reconstructing Existing Levels through Level Inpainting. (arXiv:2309.09472v3 [cs.CV] UPDATED)
    Procedural Content Generation (PCG) and Procedural Content Generation via Machine Learning (PCGML) have been used in prior work for generating levels in various games. This paper introduces Content Augmentation and focuses on the subproblem of level inpainting, which involves reconstructing and extending video game levels. Drawing inspiration from image inpainting, we adapt two techniques from this domain to address our specific use case. We present two approaches for level inpainting: an Autoencoder and a U-net. Through a comprehensive case study, we demonstrate their superior performance compared to a baseline method and discuss their relative merits. Furthermore, we provide a practical demonstration of both approaches for the level inpainting task and offer insights into potential directions for future research.
    Sampling via Gradient Flows in the Space of Probability Measures. (arXiv:2310.03597v1 [stat.ML])
    Sampling a target probability distribution with an unknown normalization constant is a fundamental challenge in computational science and engineering. Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development. This paper makes three contributions to this sampling approach by scrutinizing the design components of such gradient flows. Any instantiation of a gradient flow for sampling needs an energy functional and a metric to determine the flow, as well as numerical approximations of the flow to derive algorithms. Our first contribution is to show that the Kullback-Leibler divergence, as an energy functional, has the unique property (among all f-divergences) that gradient flows resulting from it do not depend on the normalization constant of the target distribution. Our second contribution is to study the choice of metric from the perspective of invariance. The Fisher-Rao metric is known as the unique choice (up to scaling) that is diffeomorphism invariant. As a computationally tractable alternative, we introduce a relaxed, affine invariance property for the metrics and gradient flows. In particular, we construct various affine invariant Wasserstein and Stein gradient flows. Affine invariant gradient flows are shown to behave more favorably than their non-affine-invariant counterparts when sampling highly anisotropic distributions, in theory and by using particle methods. Our third contribution is to study, and develop efficient algorithms based on Gaussian approximations of the gradient flows; this leads to an alternative to particle methods. We establish connections between various Gaussian approximate gradient flows, discuss their relation to gradient methods arising from parametric variational inference, and study their convergence properties both theoretically and numerically.
    Generative models for two-ground-truth partitions in networks. (arXiv:2302.02787v3 [cs.SI] UPDATED)
    A myriad of approaches have been proposed to characterise the mesoscale structure of networks - most often as a partition based on patterns variously called communities, blocks, or clusters. Clearly, distinct methods designed to detect different types of patterns may provide a variety of answers to the network's mesoscale structure. Yet, even multiple runs of a given method can sometimes yield diverse and conflicting results, producing entire landscapes of partitions which potentially include multiple (locally optimal) mesoscale explanations of the network. Such ambiguity motivates a closer look at the ability of these methods to find multiple qualitatively different 'ground truth' partitions in a network. Here, we propose the stochastic cross-block model (SCBM), a generative model which allows for two distinct partitions to be built into the mesoscale structure of a single benchmark network. We demonstrate a use case of the benchmark model by appraising the power of stochastic block models (SBMs) to detect implicitly planted coexisting bi-community and core-periphery structures of different strengths. Given our model design and experimental set-up, we find that the ability to detect the two partitions individually varies by SBM variant and that coexistence of both partitions is recovered only in a very limited number of cases. Our findings suggest that in most instances only one - in some way dominating - structure can be detected, even in the presence of other partitions. They underline the need for considering entire landscapes of partitions when different competing explanations exist and motivate future research to advance partition coexistence detection methods. Our model also contributes to the field of benchmark networks more generally by enabling further exploration of the ability of new and existing methods to detect ambiguity in the mesoscale structure of networks.
    Numerical Weather Forecasting using Convolutional-LSTM with Attention and Context Matcher Mechanisms. (arXiv:2102.00696v2 [cs.LG] UPDATED)
    Numerical weather forecasting using high-resolution physical models often requires extensive computational resources on supercomputers, which diminishes their wide usage in most real-life applications. As a remedy, applying deep learning methods has revealed innovative solutions within this field. To this end, we introduce a novel deep learning architecture for forecasting high-resolution spatio-temporal weather data. Our approach extends the conventional encoder-decoder structure by integrating Convolutional Long-short Term Memory and Convolutional Neural Networks. In addition, we incorporate attention and context matcher mechanisms into the model architecture. Our Weather Model achieves significant performance improvements compared to baseline deep learning models, including ConvLSTM, TrajGRU, and U-Net. Our experimental evaluation involves high-scale, real-world benchmark numerical weather datasets, namely the ERA5 hourly dataset on pressure levels and WeatherBench. Our results demonstrate substantial improvements in identifying spatial and temporal correlations with attention matrices focusing on distinct parts of the input series to model atmospheric circulations. We also compare our model with high-resolution physical models using the benchmark metrics and show that our Weather Model is accurate and easy to interpret.
    Spatial-temporal associations representation and application for process monitoring using graph convolution neural network. (arXiv:2205.05250v2 [cs.LG] UPDATED)
    Thank you very much for the attention and concern of colleagues and scholars in this work. With the comments and guidance of experts, editors, and reviewers, this work has been accepted for publishing in the journal "Process Safety and Environmental Protection". The theme of this paper relies on the Spatial-temporal associations of numerous variables in the same industrial processes, which refers to numerous variables obtained in dynamic industrial processes with Spatial-temporal correlation characteristics, i.e., these variables are not only highly correlated in time but also interrelated in space. To handle this problem, three key issues need to be well addressed: variable characteristics modeling and representation, graph network construction (temporal information), and graph characteristics perception. The first issue is implemented by assuming the data follows one improved Gaussian distribution, while the graph network can be defined by the monitoring variables and their edges which are calculated by their characteristics in time. Finally, these networks corresponding to process states at different times are fed into a graph convolutional neural network to implement graph classification to achieve process monitoring. A benchmark experiment (Tennessee Eastman chemical process) and one application study (cobalt purification from zinc solution) are employed to demonstrate the feasibility and applicability of this paper.
    Disentangling the Link Between Image Statistics and Human Perception. (arXiv:2303.09874v3 [cs.CV] UPDATED)
    In the 1950s, Barlow and Attneave hypothesised a link between biological vision and information maximisation. Following Shannon, information was defined using the probability of natural images. A number of physiological and psychophysical phenomena have been derived ever since from principles like info-max, efficient coding, or optimal denoising. However, it remains unclear how this link is expressed in mathematical terms from image probability. First, classical derivations were subjected to strong assumptions on the probability models and on the behaviour of the sensors. Moreover, the direct evaluation of the hypothesis was limited by the inability of the classical image models to deliver accurate estimates of the probability. In this work we directly evaluate image probabilities using an advanced generative model for natural images, and we analyse how probability-related factors can be combined to predict human perception via sensitivity of state-of-the-art subjective image quality metrics. We use information theory and regression analysis to find a combination of just two probability-related factors that achieves 0.8 correlation with subjective metrics. This probability-based sensitivity is psychophysically validated by reproducing the basic trends of the Contrast Sensitivity Function, its suprathreshold variation, and trends of the Weber-law and masking.
    OMG-ATTACK: Self-Supervised On-Manifold Generation of Transferable Evasion Attacks. (arXiv:2310.03707v1 [cs.LG])
    Evasion Attacks (EA) are used to test the robustness of trained neural networks by distorting input data to misguide the model into incorrect classifications. Creating these attacks is a challenging task, especially with the ever-increasing complexity of models and datasets. In this work, we introduce a self-supervised, computationally economical method for generating adversarial examples, designed for the unseen black-box setting. Adapting techniques from representation learning, our method generates on-manifold EAs that are encouraged to resemble the data distribution. These attacks are comparable in effectiveness compared to the state-of-the-art when attacking the model trained on, but are significantly more effective when attacking unseen models, as the attacks are more related to the data rather than the model itself. Our experiments consistently demonstrate the method is effective across various models, unseen data categories, and even defended models, suggesting a significant role for on-manifold EAs when targeting unseen models.
    Efficient Biologically Plausible Adversarial Training. (arXiv:2309.17348v3 [cs.LG] UPDATED)
    Artificial Neural Networks (ANNs) trained with Backpropagation (BP) show astounding performance and are increasingly often used in performing our daily life tasks. However, ANNs are highly vulnerable to adversarial attacks, which alter inputs with small targeted perturbations that drastically disrupt the models' performance. The most effective method to make ANNs robust against these attacks is adversarial training, in which the training dataset is augmented with exemplary adversarial samples. Unfortunately, this approach has the drawback of increased training complexity since generating adversarial samples is very computationally demanding. In contrast to ANNs, humans are not susceptible to adversarial attacks. Therefore, in this work, we investigate whether biologically-plausible learning algorithms are more robust against adversarial attacks than BP. In particular, we present an extensive comparative analysis of the adversarial robustness of BP and Present the Error to Perturb the Input To modulate Activity (PEPITA), a recently proposed biologically-plausible learning algorithm, on various computer vision tasks. We observe that PEPITA has higher intrinsic adversarial robustness and, with adversarial training, has a more favourable natural-vs-adversarial performance trade-off as, for the same natural accuracies, PEPITA's adversarial accuracies decrease in average by 0.26% and BP's by 8.05%.
    LoRA ensembles for large language model fine-tuning. (arXiv:2310.00035v2 [cs.LG] UPDATED)
    Finetuned LLMs often exhibit poor uncertainty quantification, manifesting as overconfidence, poor calibration, and unreliable prediction results on test data or out-of-distribution samples. One approach commonly used in vision for alleviating this issue is a deep ensemble, which constructs an ensemble by training the same model multiple times using different random initializations. However, there is a huge challenge to ensembling LLMs: the most effective LLMs are very, very large. Keeping a single LLM in memory is already challenging enough: keeping an ensemble of e.g. 5 LLMs in memory is impossible in many settings. To address these issues, we propose an ensemble approach using Low-Rank Adapters (LoRA), a parameter-efficient fine-tuning technique. Critically, these low-rank adapters represent a very small number of parameters, orders of magnitude less than the underlying pre-trained model. Thus, it is possible to construct large ensembles of LoRA adapters with almost the same computational overhead as using the original model. We find that LoRA ensembles, applied on its own or on top of pre-existing regularization techniques, gives consistent improvements in predictive accuracy and uncertainty quantification.
    Learning Robust Statistics for Simulation-based Inference under Model Misspecification. (arXiv:2305.15871v3 [stat.ML] UPDATED)
    Simulation-based inference (SBI) methods such as approximate Bayesian computation (ABC), synthetic likelihood, and neural posterior estimation (NPE) rely on simulating statistics to infer parameters of intractable likelihood models. However, such methods are known to yield untrustworthy and misleading inference outcomes under model misspecification, thus hindering their widespread applicability. In this work, we propose the first general approach to handle model misspecification that works across different classes of SBI methods. Leveraging the fact that the choice of statistics determines the degree of misspecification in SBI, we introduce a regularized loss function that penalises those statistics that increase the mismatch between the data and the model. Taking NPE and ABC as use cases, we demonstrate the superior performance of our method on high-dimensional time-series models that are artificially misspecified. We also apply our method to real data from the field of radio propagation where the model is known to be misspecified. We show empirically that the method yields robust inference in misspecified scenarios, whilst still being accurate when the model is well-specified.
    Analysis of learning a flow-based generative model from limited sample complexity. (arXiv:2310.03575v1 [stat.ML])
    We study the problem of training a flow-based generative model, parametrized by a two-layer autoencoder, to sample from a high-dimensional Gaussian mixture. We provide a sharp end-to-end analysis of the problem. First, we provide a tight closed-form characterization of the learnt velocity field, when parametrized by a shallow denoising auto-encoder trained on a finite number $n$ of samples from the target distribution. Building on this analysis, we provide a sharp description of the corresponding generative flow, which pushes the base Gaussian density forward to an approximation of the target density. In particular, we provide closed-form formulae for the distance between the mean of the generated mixture and the mean of the target mixture, which we show decays as $\Theta_n(\frac{1}{n})$. Finally, this rate is shown to be in fact Bayes-optimal.
    A Comprehensive Survey of Dataset Distillation. (arXiv:2301.05603v3 [cs.LG] UPDATED)
    Deep learning technology has developed unprecedentedly in the last decade and has become the primary choice in many application domains. This progress is mainly attributed to a systematic collaboration in which rapidly growing computing resources encourage advanced algorithms to deal with massive data. However, it has gradually become challenging to handle the unlimited growth of data with limited computing power. To this end, diverse approaches are proposed to improve data processing efficiency. Dataset distillation, a dataset reduction method, addresses this problem by synthesizing a small typical dataset from substantial data and has attracted much attention from the deep learning community. Existing dataset distillation methods can be taxonomized into meta-learning and data matching frameworks according to whether they explicitly mimic the performance of target data. Although dataset distillation has shown surprising performance in compressing datasets, there are still several limitations such as distilling high-resolution data or data with complex label spaces. This paper provides a holistic understanding of dataset distillation from multiple aspects, including distillation frameworks and algorithms, factorized dataset distillation, performance comparison, and applications. Finally, we discuss challenges and promising directions to further promote future studies on dataset distillation.
    Adversarial Machine Learning for Social Good: Reframing the Adversary as an Ally. (arXiv:2310.03614v1 [cs.LG])
    Deep Neural Networks (DNNs) have been the driving force behind many of the recent advances in machine learning. However, research has shown that DNNs are vulnerable to adversarial examples -- input samples that have been perturbed to force DNN-based models to make errors. As a result, Adversarial Machine Learning (AdvML) has gained a lot of attention, and researchers have investigated these vulnerabilities in various settings and modalities. In addition, DNNs have also been found to incorporate embedded bias and often produce unexplainable predictions, which can result in anti-social AI applications. The emergence of new AI technologies that leverage Large Language Models (LLMs), such as ChatGPT and GPT-4, increases the risk of producing anti-social applications at scale. AdvML for Social Good (AdvML4G) is an emerging field that repurposes the AdvML bug to invent pro-social applications. Regulators, practitioners, and researchers should collaborate to encourage the development of pro-social applications and hinder the development of anti-social ones. In this work, we provide the first comprehensive review of the emerging field of AdvML4G. This paper encompasses a taxonomy that highlights the emergence of AdvML4G, a discussion of the differences and similarities between AdvML4G and AdvML, a taxonomy covering social good-related concepts and aspects, an exploration of the motivations behind the emergence of AdvML4G at the intersection of ML4G and AdvML, and an extensive summary of the works that utilize AdvML4G as an auxiliary tool for innovating pro-social applications. Finally, we elaborate upon various challenges and open research issues that require significant attention from the research community.
    Decoding speech perception from non-invasive brain recordings. (arXiv:2208.12266v2 [eess.AS] UPDATED)
    Decoding speech from brain activity is a long-awaited goal in both healthcare and neuroscience. Invasive devices have recently led to major milestones in that regard: deep learning algorithms trained on intracranial recordings now start to decode elementary linguistic features (e.g. letters, words, spectrograms). However, extending this approach to natural speech and non-invasive brain recordings remains a major challenge. Here, we introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from the non-invasive recordings of a large cohort of healthy individuals. To evaluate this approach, we curate and integrate four public datasets, encompassing 175 volunteers recorded with magneto- or electro-encephalography (M/EEG), while they listened to short stories and isolated sentences. The results show that our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities on average across participants, and more than 80% in the very best participants - a performance that allows the decoding of words and phrases absent from the training set. The comparison of our model to a variety of baselines highlights the importance of (i) a contrastive objective, (ii) pretrained representations of speech and (iii) a common convolutional architecture simultaneously trained across multiple participants. Finally, the analysis of the decoder's predictions suggests that they primarily depend on lexical and contextual semantic representations. Overall, this effective decoding of perceived speech from non-invasive recordings delineates a promising path to decode language from brain activity, without putting patients at risk for brain surgery.
    Regression with Label Differential Privacy. (arXiv:2212.06074v3 [cs.LG] UPDATED)
    We study the task of training regression models with the guarantee of label differential privacy (DP). Based on a global prior distribution on label values, which could be obtained privately, we derive a label DP randomization mechanism that is optimal under a given regression loss function. We prove that the optimal mechanism takes the form of a "randomized response on bins", and propose an efficient algorithm for finding the optimal bin values. We carry out a thorough experimental evaluation on several datasets demonstrating the efficacy of our algorithm.
    Unsupervised Foreground Extraction via Deep Region Competition. (arXiv:2110.15497v4 [cs.CV] UPDATED)
    We present Deep Region Competition (DRC), an algorithm designed to extract foreground objects from images in a fully unsupervised manner. Foreground extraction can be viewed as a special case of generic image segmentation that focuses on identifying and disentangling objects from the background. In this work, we rethink the foreground extraction by reconciling energy-based prior with generative image modeling in the form of Mixture of Experts (MoE), where we further introduce the learned pixel re-assignment as the essential inductive bias to capture the regularities of background regions. With this modeling, the foreground-background partition can be naturally found through Expectation-Maximization (EM). We show that the proposed method effectively exploits the interaction between the mixture components during the partitioning process, which closely connects to region competition, a seminal approach for generic image segmentation. Experiments demonstrate that DRC exhibits more competitive performances on complex real-world data and challenging multi-object scenes compared with prior methods. Moreover, we show empirically that DRC can potentially generalize to novel foreground objects even from categories unseen during training.
    Algebraic and Geometric Models for Space Networking. (arXiv:2304.01150v2 [math.AT] UPDATED)
    In this paper we introduce some new algebraic and geometric perspectives on networked space communications. Our main contribution is a novel definition of a time-varying graph (TVG), defined in terms of a matrix with values in subsets of the real line P(R). We leverage semi-ring properties of P(R) to model multi-hop communication in a TVG using matrix multiplication and a truncated Kleene star. This leads to novel statistics on the communication capacity of TVGs called lifetime curves, which we generate for large samples of randomly chosen STARLINK satellites, whose connectivity is modeled over day-long simulations. Determining when a large subsample of STARLINK is temporally strongly connected is further analyzed using novel metrics introduced here that are inspired by topological data analysis (TDA). To better model networking scenarios between the Earth and Mars, we introduce various semi-rings capable of modeling propagation delay as well as protocols common to Delay Tolerant Networking (DTN), such as store-and-forward. Finally, we illustrate the applicability of zigzag persistence for featurizing different space networks and demonstrate the efficacy of K-Nearest Neighbors (KNN) classification for distinguishing Earth-Mars and Earth-Moon satellite systems using time-varying topology alone.
    Linking Across Data Granularity: Fitting Multivariate Hawkes Processes to Partially Interval-Censored Data. (arXiv:2111.02062v3 [cs.LG] UPDATED)
    The multivariate Hawkes process (MHP) is widely used for analyzing data streams that interact with each other, where events generate new events within their own dimension (via self-excitation) or across different dimensions (via cross-excitation). However, in certain applications, the timestamps of individual events in some dimensions are unobservable, and only event counts within intervals are known, referred to as partially interval-censored data. The MHP is unsuitable for handling such data since its estimation requires event timestamps. In this study, we introduce the Partial Mean Behavior Poisson (PMBP) process, a novel point process which shares parameter equivalence with the MHP and can effectively model both timestamped and interval-censored data. We demonstrate the capabilities of the PMBP process using synthetic and real-world datasets. Firstly, we illustrate that the PMBP process can approximate MHP parameters and recover the spectral radius using synthetic event histories. Next, we assess the performance of the PMBP process in predicting YouTube popularity and find that it surpasses state-of-the-art methods. Lastly, we leverage the PMBP process to gain qualitative insights from a dataset comprising daily COVID-19 case counts from multiple countries and COVID-19-related news articles. By clustering the PMBP-modeled countries, we unveil hidden interaction patterns between occurrences of COVID-19 cases and news reporting.
    Unpaired Image-to-Image Translation via Neural Schr\"odinger Bridge. (arXiv:2305.15086v2 [cs.CV] UPDATED)
    Diffusion models are a powerful class of generative models which simulate stochastic differential equations (SDEs) to generate data from noise. Although diffusion models have achieved remarkable progress in recent years, they have limitations in the unpaired image-to-image translation tasks due to the Gaussian prior assumption. Schr\"odinger Bridge (SB), which learns an SDE to translate between two arbitrary distributions, have risen as an attractive solution to this problem. However, none of SB models so far have been successful at unpaired translation between high-resolution images. In this work, we propose the Unpaired Neural Schr\"odinger Bridge (UNSB), which expresses SB problem as a sequence of adversarial learning problems. This allows us to incorporate advanced discriminators and regularization to learn a SB between unpaired data. We demonstrate that UNSB is scalable and successfully solves various unpaired image-to-image translation tasks. Code: \url{https://github.com/cyclomon/UNSB}
    Distribution-free risk assessment of regression-based machine learning algorithms. (arXiv:2310.03545v1 [cs.LG])
    Machine learning algorithms have grown in sophistication over the years and are increasingly deployed for real-life applications. However, when using machine learning techniques in practical settings, particularly in high-risk applications such as medicine and engineering, obtaining the failure probability of the predictive model is critical. We refer to this problem as the risk-assessment task. We focus on regression algorithms and the risk-assessment task of computing the probability of the true label lying inside an interval defined around the model's prediction. We solve the risk-assessment problem using the conformal prediction approach, which provides prediction intervals that are guaranteed to contain the true label with a given probability. Using this coverage property, we prove that our approximated failure probability is conservative in the sense that it is not lower than the true failure probability of the ML algorithm. We conduct extensive experiments to empirically study the accuracy of the proposed method for problems with and without covariate shift. Our analysis focuses on different modeling regimes, dataset sizes, and conformal prediction methodologies.
    Time-Varying Propensity Score to Bridge the Gap between the Past and Present. (arXiv:2210.01422v4 [cs.LG] UPDATED)
    Real-world deployment of machine learning models is challenging because data evolves over time. While no model can work when data evolves in an arbitrary fashion, if there is some pattern to these changes, we might be able to design methods to address it. This paper addresses situations when data evolves gradually. We introduce a time-varying propensity score that can detect gradual shifts in the distribution of data which allows us to selectively sample past data to update the model -- not just similar data from the past like that of a standard propensity score but also data that evolved in a similar fashion in the past. The time-varying propensity score is quite general: we demonstrate different ways of implementing it and evaluate it on a variety of problems ranging from supervised learning (e.g., image classification problems) where data undergoes a sequence of gradual shifts, to reinforcement learning tasks (e.g., robotic manipulation and continuous control) where data shifts as the policy or the task changes.
    On Convergence of Federated Averaging Langevin Dynamics. (arXiv:2112.05120v4 [stat.ML] UPDATED)
    We propose a federated averaging Langevin algorithm (FA-LD) for uncertainty quantification and mean predictions with distributed clients. In particular, we generalize beyond normal posterior distributions and consider a general class of models. We develop theoretical guarantees for FA-LD for strongly log-concave distributions with non-i.i.d data and study how the injected noise and the stochastic-gradient noise, the heterogeneity of data, and the varying learning rates affect the convergence. Such an analysis sheds light on the optimal choice of local updates to minimize communication costs. Important to our approach is that the communication efficiency does not deteriorate with the injected noise in the Langevin algorithms. In addition, we examine in our FA-LD algorithm both independent and correlated noise used over different clients. We observe there is a trade-off between the pairs among communication, accuracy, and data privacy. As local devices may become inactive in federated networks, we also show convergence results based on different averaging schemes where only partial device updates are available. In such a case, we discover an additional bias that does not decay to zero.
    TPDR: A Novel Two-Step Transformer-based Product and Class Description Match and Retrieval Method. (arXiv:2310.03491v1 [cs.IR])
    There is a niche of companies responsible for intermediating the purchase of large batches of varied products for other companies, for which the main challenge is to perform product description standardization, i.e., matching an item described by a client with a product described in a catalog. The problem is complex since the client's product description may be: (1) potentially noisy; (2) short and uninformative (e.g., missing information about model and size); and (3) cross-language. In this paper, we formalize this problem as a ranking task: given an initial client product specification (query), return the most appropriate standardized descriptions (response). In this paper, we propose TPDR, a two-step Transformer-based Product and Class Description Retrieval method that is able to explore the semantic correspondence between IS and SD, by exploiting attention mechanisms and contrastive learning. First, TPDR employs the transformers as two encoders sharing the embedding vector space: one for encoding the IS and another for the SD, in which corresponding pairs (IS, SD) must be close in the vector space. Closeness is further enforced by a contrastive learning mechanism leveraging a specialized loss function. TPDR also exploits a (second) re-ranking step based on syntactic features that are very important for the exact matching (model, dimension) of certain products that may have been neglected by the transformers. To evaluate our proposal, we consider 11 datasets from a real company, covering different application contexts. Our solution was able to retrieve the correct standardized product before the 5th ranking position in 71% of the cases and its correct category in the first position in 80% of the situations. Moreover, the effectiveness gains over purely syntactic or semantic baselines reach up to 3.7 times, solving cases that none of the approaches in isolation can do by themselves.
    Plug-and-Play Posterior Sampling under Mismatched Measurement and Prior Models. (arXiv:2310.03546v1 [stat.ML])
    Posterior sampling has been shown to be a powerful Bayesian approach for solving imaging inverse problems. The recent plug-and-play unadjusted Langevin algorithm (PnP-ULA) has emerged as a promising method for Monte Carlo sampling and minimum mean squared error (MMSE) estimation by combining physical measurement models with deep-learning priors specified using image denoisers. However, the intricate relationship between the sampling distribution of PnP-ULA and the mismatched data-fidelity and denoiser has not been theoretically analyzed. We address this gap by proposing a posterior-L2 pseudometric and using it to quantify an explicit error bound for PnP-ULA under mismatched posterior distribution. We numerically validate our theory on several inverse problems such as sampling from Gaussian mixture models and image deblurring. Our results suggest that the sensitivity of the sampling distribution of PnP-ULA to a mismatch in the measurement model and the denoiser can be precisely characterized.
    Pre-Training and Fine-Tuning Generative Flow Networks. (arXiv:2310.03419v1 [cs.LG])
    Generative Flow Networks (GFlowNets) are amortized samplers that learn stochastic policies to sequentially generate compositional objects from a given unnormalized reward distribution. They can generate diverse sets of high-reward objects, which is an important consideration in scientific discovery tasks. However, as they are typically trained from a given extrinsic reward function, it remains an important open challenge about how to leverage the power of pre-training and train GFlowNets in an unsupervised fashion for efficient adaptation to downstream tasks. Inspired by recent successes of unsupervised pre-training in various domains, we introduce a novel approach for reward-free pre-training of GFlowNets. By framing the training as a self-supervised problem, we propose an outcome-conditioned GFlowNet (OC-GFN) that learns to explore the candidate space. Specifically, OC-GFN learns to reach any targeted outcomes, akin to goal-conditioned policies in reinforcement learning. We show that the pre-trained OC-GFN model can allow for a direct extraction of a policy capable of sampling from any new reward functions in downstream tasks. Nonetheless, adapting OC-GFN on a downstream task-specific reward involves an intractable marginalization over possible outcomes. We propose a novel way to approximate this marginalization by learning an amortized predictor enabling efficient fine-tuning. Extensive experimental results validate the efficacy of our approach, demonstrating the effectiveness of pre-training the OC-GFN, and its ability to swiftly adapt to downstream tasks and discover modes more efficiently. This work may serve as a foundation for further exploration of pre-training strategies in the context of GFlowNets.
    GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary Generation. (arXiv:2303.14655v2 [cs.CV] UPDATED)
    Despite the recent emergence of video captioning models, how to generate vivid, fine-grained video descriptions based on the background knowledge (i.e., long and informative commentary about the domain-specific scenes with appropriate reasoning) is still far from being solved, which however has great applications such as automatic sports narrative. In this paper, we present GOAL, a benchmark of over 8.9k soccer video clips, 22k sentences, and 42k knowledge triples for proposing a challenging new task setting as Knowledge-grounded Video Captioning (KGVC). Moreover, we conduct experimental adaption of existing methods to show the difficulty and potential directions for solving this valuable and applicable task. Our data and code are available at https://github.com/THU-KEG/goal.
    GRAPES: Learning to Sample Graphs for Scalable Graph Neural Networks. (arXiv:2310.03399v1 [cs.LG])
    Graph neural networks (GNNs) learn the representation of nodes in a graph by aggregating the neighborhood information in various ways. As these networks grow in depth, their receptive field grows exponentially due to the increase in neighborhood sizes, resulting in high memory costs. Graph sampling solves memory issues in GNNs by sampling a small ratio of the nodes in the graph. This way, GNNs can scale to much larger graphs. Most sampling methods focus on fixed sampling heuristics, which may not generalize to different structures or tasks. We introduce GRAPES, an adaptive graph sampling method that learns to identify sets of influential nodes for training a GNN classifier. GRAPES uses a GFlowNet to learn node sampling probabilities given the classification objectives. We evaluate GRAPES across several small- and large-scale graph benchmarks and demonstrate its effectiveness in accuracy and scalability. In contrast to existing sampling methods, GRAPES maintains high accuracy even with small sample sizes and, therefore, can scale to very large graphs. Our code is publicly available at https://github.com/dfdazac/grapes.
    Adapting Large Language Models for Content Moderation: Pitfalls in Data Engineering and Supervised Fine-tuning. (arXiv:2310.03400v1 [cs.LG])
    Nowadays, billions of people engage in communication and express their opinions on the internet daily. Unfortunately, not all of these expressions are friendly or compliant, making content moderation an indispensable task. With the successful development of Large Language Models (LLMs) in recent years, LLM-based methods have become a feasible solution for handling tasks in various domains. However, in the field of content moderation, there is still a lack of detailed work that systematically introduces implementation details. In this paper, we introduce how to fine-tune an LLM model that can be privately deployed for content moderation. Specifically, we discuss whether incorporating reasons during the fine-tuning process would be better or if it should be treated as a classification task directly. We also explore the benefits of utilizing reasons generated by more powerful LLMs for fine-tuning privately deployed models and the impact of different processing approaches when the answers generated by the more powerful LLMs are incorrect. We report the entire research process and the key findings in this paper, hoping to provide valuable experience for researchers who are fine-tuning privately deployed models in their domain-specific research.
    Investigating the Limitation of CLIP Models: The Worst-Performing Categories. (arXiv:2310.03324v1 [cs.CV])
    Contrastive Language-Image Pre-training (CLIP) provides a foundation model by integrating natural language into visual concepts, enabling zero-shot recognition on downstream tasks. It is usually expected that satisfactory overall accuracy can be achieved across numerous domains through well-designed textual prompts. However, we found that their performance in the worst categories is significantly inferior to the overall performance. For example, on ImageNet, there are a total of 10 categories with class-wise accuracy as low as 0\%, even though the overall performance has achieved 64.1\%. This phenomenon reveals the potential risks associated with using CLIP models, particularly in risk-sensitive applications where specific categories hold significant importance. To address this issue, we investigate the alignment between the two modalities in the CLIP model and propose the Class-wise Matching Margin (\cmm) to measure the inference confusion. \cmm\ can effectively identify the worst-performing categories and estimate the potential performance of the candidate prompts. We further query large language models to enrich descriptions of worst-performing categories and build a weighted ensemble to highlight the efficient prompts. Experimental results clearly verify the effectiveness of our proposal, where the accuracy on the worst-10 categories on ImageNet is boosted to 5.2\%, without manual prompt engineering, laborious optimization, or access to labeled validation data.
    Uncertainty quantification for deep learning-based schemes for solving high-dimensional backward stochastic differential equations. (arXiv:2310.03393v1 [math.NA])
    Deep learning-based numerical schemes for solving high-dimensional backward stochastic differential equations (BSDEs) have recently raised plenty of scientific interest. While they enable numerical methods to approximate very high-dimensional BSDEs, their reliability has not been studied and is thus not understood. In this work, we study uncertainty quantification (UQ) for a class of deep learning-based BSDE schemes. More precisely, we review the sources of uncertainty involved in the schemes and numerically study the impact of different sources. Usually, the standard deviation (STD) of the approximate solutions obtained from multiple runs of the algorithm with different datasets is calculated to address the uncertainty. This approach is computationally quite expensive, especially for high-dimensional problems. Hence, we develop a UQ model that efficiently estimates the STD of the approximate solution using only a single run of the algorithm. The model also estimates the mean of the approximate solution, which can be leveraged to initialize the algorithm and improve the optimization process. Our numerical experiments show that the UQ model produces reliable estimates of the mean and STD of the approximate solution for the considered class of deep learning-based BSDE schemes. The estimated STD captures multiple sources of uncertainty, demonstrating its effectiveness in quantifying the uncertainty. Additionally, the model illustrates the improved performance when comparing different schemes based on the estimated STD values. Furthermore, it can identify hyperparameter values for which the scheme achieves good approximations.
    Stable Training of Probabilistic Models Using the Leave-One-Out Maximum Log-Likelihood Objective. (arXiv:2310.03556v1 [stat.ML])
    Probabilistic modelling of power systems operation and planning processes depends on data-driven methods, which require sufficiently large datasets. When historical data lacks this, it is desired to model the underlying data generation mechanism as a probability distribution to assess the data quality and generate more data, if needed. Kernel density estimation (KDE) based models are popular choices for this task, but they fail to adapt to data regions with varying densities. In this paper, an adaptive KDE model is employed to circumvent this, where each kernel in the model has an individual bandwidth. The leave-one-out maximum log-likelihood (LOO-MLL) criterion is proposed to prevent the singular solutions that the regular MLL criterion gives rise to, and it is proven that LOO-MLL prevents these. Relying on this guaranteed robustness, the model is extended by assigning learnable weights to the kernels. In addition, a modified expectation-maximization algorithm is employed to accelerate the optimization speed reliably. The performance of the proposed method and models are exhibited on two power systems datasets using different statistical tests and by comparison with Gaussian mixture models. Results show that the proposed models have promising performance, in addition to their singularity prevention guarantees.
    A Latent Variable Approach for Non-Hierarchical Multi-Fidelity Adaptive Sampling. (arXiv:2310.03298v1 [stat.ML])
    Multi-fidelity (MF) methods are gaining popularity for enhancing surrogate modeling and design optimization by incorporating data from various low-fidelity (LF) models. While most existing MF methods assume a fixed dataset, adaptive sampling methods that dynamically allocate resources among fidelity models can achieve higher efficiency in the exploring and exploiting the design space. However, most existing MF methods rely on the hierarchical assumption of fidelity levels or fail to capture the intercorrelation between multiple fidelity levels and utilize it to quantify the value of the future samples and navigate the adaptive sampling. To address this hurdle, we propose a framework hinged on a latent embedding for different fidelity models and the associated pre-posterior analysis to explicitly utilize their correlation for adaptive sampling. In this framework, each infill sampling iteration includes two steps: We first identify the location of interest with the greatest potential improvement using the high-fidelity (HF) model, then we search for the next sample across all fidelity levels that maximize the improvement per unit cost at the location identified in the first step. This is made possible by a single Latent Variable Gaussian Process (LVGP) model that maps different fidelity models into an interpretable latent space to capture their correlations without assuming hierarchical fidelity levels. The LVGP enables us to assess how LF sampling candidates will affect HF response with pre-posterior analysis and determine the next sample with the best benefit-to-cost ratio. Through test cases, we demonstrate that the proposed method outperforms the benchmark methods in both MF global fitting (GF) and Bayesian Optimization (BO) problems in convergence rate and robustness. Moreover, the method offers the flexibility to switch between GF and BO by simply changing the acquisition function.
    Text as Environment: A Deep Reinforcement Learning Text Readability Assessment Model. (arXiv:1912.05957v3 [cs.CL] UPDATED)
    Evaluating the readability of a text can significantly facilitate the precise expression of information in written form. The formulation of text readability assessment involves the identification of meaningful properties of the text regardless of its length. Sophisticated features and models are used to evaluate the comprehensibility of texts accurately. Despite this, the problem of assessing texts' readability efficiently remains relatively untouched. The efficiency of state-of-the-art text readability assessment models can be further improved using deep reinforcement learning models. Using a hard attention-based active inference technique, the proposed approach makes efficient use of input text and computational resources. Through the use of semi-supervised signals, the reinforcement learning model uses the minimum amount of text in order to determine text's readability. A comparison of the model on Weebit and Cambridge Exams with state-of-the-art models, such as the BERT text readability model, shows that it is capable of achieving state-of-the-art accuracy with a significantly smaller amount of input text than other models.
    Neural Language Model Pruning for Automatic Speech Recognition. (arXiv:2310.03424v1 [cs.LG])
    We study model pruning methods applied to Transformer-based neural network language models for automatic speech recognition. We explore three aspects of the pruning frame work, namely criterion, method and scheduler, analyzing their contribution in terms of accuracy and inference speed. To the best of our knowledge, such in-depth analyses on large-scale recognition systems has not been reported in the literature. In addition, we propose a variant of low-rank approximation suitable for incrementally compressing models, and delivering multiple models with varied target sizes. Among other results, we show that a) data-driven pruning outperforms magnitude-driven in several scenarios; b) incremental pruning achieves higher accuracy compared to one-shot pruning, especially when targeting smaller sizes; and c) low-rank approximation presents the best trade-off between size reduction and inference speed-up for moderate compression.
    Probabilistic Forecasting of Day-Ahead Electricity Prices and their Volatility with LSTMs. (arXiv:2310.03339v1 [cs.LG])
    Accurate forecasts of electricity prices are crucial for the management of electric power systems and the development of smart applications. European electricity prices have risen substantially and became highly volatile after the Russian invasion of Ukraine, challenging established forecasting methods. Here, we present a Long Short-Term Memory (LSTM) model for the German-Luxembourg day-ahead electricity prices addressing these challenges. The recurrent structure of the LSTM allows the model to adapt to trends, while the joint prediction of both mean and standard deviation enables a probabilistic prediction. Using a physics-inspired approach - superstatistics - to derive an explanation for the statistics of prices, we show that the LSTM model faithfully reproduces both prices and their volatility.
    Deep Controlled Learning for Inventory Control. (arXiv:2011.15122v6 [cs.LG] UPDATED)
    Problem Definition: Are traditional deep reinforcement learning (DRL) algorithms, developed for a broad range of purposes including game-play and robotics, the most suitable machine learning algorithms for applications in inventory control? To what extent would DRL algorithms tailored to the unique characteristics of inventory control problems provide superior performance compared to DRL and traditional benchmarks? Methodology/results: We propose and study Deep Controlled Learning (DCL), a new DRL framework based on approximate policy iteration specifically designed to tackle inventory problems. Comparative evaluations reveal that DCL outperforms existing state-of-the-art heuristics in lost sales inventory control, perishable inventory systems, and inventory systems with random lead times, achieving lower average costs across all test instances and maintaining an optimality gap of no more than 0.1\%. Notably, the same hyperparameter set is utilized across all experiments, underscoring the robustness and generalizability of the proposed method. Managerial implications: These substantial performance and robustness improvements pave the way for the effective application of tailored DRL algorithms to inventory management problems, empowering decision-makers to optimize stock levels, minimize costs, and enhance responsiveness across various industries.
    FedNAR: Federated Optimization with Normalized Annealing Regularization. (arXiv:2310.03163v1 [cs.LG])
    Weight decay is a standard technique to improve generalization performance in modern deep neural network optimization, and is also widely adopted in federated learning (FL) to prevent overfitting in local clients. In this paper, we first explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms. While preventing overfitting is crucial, weight decay can introduce a different optimization goal towards the global objective, which is further amplified in FL due to multiple local updates and heterogeneous data distribution. To address this challenge, we develop {\it Federated optimization with Normalized Annealing Regularization} (FedNAR), a simple yet effective and versatile algorithmic plug-in that can be seamlessly integrated into any existing FL algorithms. Essentially, we regulate the magnitude of each update by performing co-clipping of the gradient and weight decay. We provide a comprehensive theoretical analysis of FedNAR's convergence rate and conduct extensive experiments on both vision and language datasets with different backbone federated optimization algorithms. Our experimental results consistently demonstrate that incorporating FedNAR into existing FL algorithms leads to accelerated convergence and heightened model accuracy. Moreover, FedNAR exhibits resilience in the face of various hyperparameter configurations. Specifically, FedNAR has the ability to self-adjust the weight decay when the initial specification is not optimal, while the accuracy of traditional FL algorithms would markedly decline. Our codes are released at \href{https://github.com/ljb121002/fednar}{https://github.com/ljb121002/fednar}.
    Resilient Legged Local Navigation: Learning to Traverse with Compromised Perception End-to-End. (arXiv:2310.03581v1 [cs.RO])
    Autonomous robots must navigate reliably in unknown environments even under compromised exteroceptive perception, or perception failures. Such failures often occur when harsh environments lead to degraded sensing, or when the perception algorithm misinterprets the scene due to limited generalization. In this paper, we model perception failures as invisible obstacles and pits, and train a reinforcement learning (RL) based local navigation policy to guide our legged robot. Unlike previous works relying on heuristics and anomaly detection to update navigational information, we train our navigation policy to reconstruct the environment information in the latent space from corrupted perception and react to perception failures end-to-end. To this end, we incorporate both proprioception and exteroception into our policy inputs, thereby enabling the policy to sense collisions on different body parts and pits, prompting corresponding reactions. We validate our approach in simulation and on the real quadruped robot ANYmal running in real-time (<10 ms CPU inference). In a quantitative comparison with existing heuristic-based locally reactive planners, our policy increases the success rate over 30% when facing perception failures. Project Page: https://bit.ly/45NBTuh.
    TimeGPT-1. (arXiv:2310.03589v1 [cs.LG])
    In this paper, we introduce TimeGPT, the first foundation model for time series, capable of generating accurate predictions for diverse datasets not seen during training. We evaluate our pre-trained model against established statistical, machine learning, and deep learning methods, demonstrating that TimeGPT zero-shot inference excels in performance, efficiency, and simplicity. Our study provides compelling evidence that insights from other domains of artificial intelligence can be effectively applied to time series analysis. We conclude that large-scale time series models offer an exciting opportunity to democratize access to precise predictions and reduce uncertainty by leveraging the capabilities of contemporary advancements in deep learning.
    Conditional Generative Models for Simulation of EMG During Naturalistic Movements. (arXiv:2211.01856v4 [cs.LG] UPDATED)
    Numerical models of electromyographic (EMG) signals have provided a huge contribution to our fundamental understanding of human neurophysiology and remain a central pillar of motor neuroscience and the development of human-machine interfaces. However, whilst modern biophysical simulations based on finite element methods are highly accurate, they are extremely computationally expensive and thus are generally limited to modelling static systems such as isometrically contracting limbs. As a solution to this problem, we propose a transfer learning approach, in which a conditional generative model is trained to mimic the output of an advanced numerical model. To this end, we present BioMime, a conditional generative neural network trained adversarially to generate motor unit activation potential waveforms under a wide variety of volume conductor parameters. We demonstrate the ability of such a model to predictively interpolate between a much smaller number of numerical model's outputs with a high accuracy. Consequently, the computational load is dramatically reduced, which allows the rapid simulation of EMG signals during truly dynamic and naturalistic movements.
    A 5' UTR Language Model for Decoding Untranslated Regions of mRNA and Function Predictions. (arXiv:2310.03281v1 [cs.LG])
    The 5' UTR, a regulatory region at the beginning of an mRNA molecule, plays a crucial role in regulating the translation process and impacts the protein expression level. Language models have showcased their effectiveness in decoding the functions of protein and genome sequences. Here, we introduced a language model for 5' UTR, which we refer to as the UTR-LM. The UTR-LM is pre-trained on endogenous 5' UTRs from multiple species and is further augmented with supervised information including secondary structure and minimum free energy. We fine-tuned the UTR-LM in a variety of downstream tasks. The model outperformed the best-known benchmark by up to 42% for predicting the Mean Ribosome Loading, and by up to 60% for predicting the Translation Efficiency and the mRNA Expression Level. The model also applies to identifying unannotated Internal Ribosome Entry Sites within the untranslated region and improves the AUPR from 0.37 to 0.52 compared to the best baseline. Further, we designed a library of 211 novel 5' UTRs with high predicted values of translation efficiency and evaluated them via a wet-lab assay. Experiment results confirmed that our top designs achieved a 32.5% increase in protein production level relative to well-established 5' UTR optimized for therapeutics.
    High-dimensional Bayesian Optimization with Group Testing. (arXiv:2310.03515v1 [cs.LG])
    Bayesian optimization is an effective method for optimizing expensive-to-evaluate black-box functions. High-dimensional problems are particularly challenging as the surrogate model of the objective suffers from the curse of dimensionality, which makes accurate modeling difficult. We propose a group testing approach to identify active variables to facilitate efficient optimization in these domains. The proposed algorithm, Group Testing Bayesian Optimization (GTBO), first runs a testing phase where groups of variables are systematically selected and tested on whether they influence the objective. To that end, we extend the well-established theory of group testing to functions of continuous ranges. In the second phase, GTBO guides optimization by placing more importance on the active dimensions. By exploiting the axis-aligned subspace assumption, GTBO is competitive against state-of-the-art methods on several synthetic and real-world high-dimensional optimization tasks. Furthermore, GTBO aids in the discovery of active parameters in applications, thereby enhancing practitioners' understanding of the problem at hand.
    Combining Differential Privacy and Byzantine Resilience in Distributed SGD. (arXiv:2110.03991v4 [cs.LG] UPDATED)
    Privacy and Byzantine resilience (BR) are two crucial requirements of modern-day distributed machine learning. The two concepts have been extensively studied individually but the question of how to combine them effectively remains unanswered. This paper contributes to addressing this question by studying the extent to which the distributed SGD algorithm, in the standard parameter-server architecture, can learn an accurate model despite (a) a fraction of the workers being malicious (Byzantine), and (b) the other fraction, whilst being honest, providing noisy information to the server to ensure differential privacy (DP). We first observe that the integration of standard practices in DP and BR is not straightforward. In fact, we show that many existing results on the convergence of distributed SGD under Byzantine faults, especially those relying on $(\alpha,f)$-Byzantine resilience, are rendered invalid when honest workers enforce DP. To circumvent this shortcoming, we revisit the theory of $(\alpha,f)$-BR to obtain an approximate convergence guarantee. Our analysis provides key insights on how to improve this guarantee through hyperparameter optimization. Essentially, our theoretical and empirical results show that (1) an imprudent combination of standard approaches to DP and BR might be fruitless, but (2) by carefully re-tuning the learning algorithm, we can obtain reasonable learning accuracy while simultaneously guaranteeing DP and BR.
    Benchmarking Large Language Models As AI Research Agents. (arXiv:2310.03302v1 [cs.LG])
    Scientific experimentation involves an iterative process of creating hypotheses, designing experiments, running experiments, and analyzing the results. Can we build AI research agents to perform these long-horizon tasks? To take a step towards building and evaluating research agents on such open-ended decision-making tasks, we focus on the problem of machine learning engineering: given a task description and a dataset, build a high-performing model. In this paper, we propose MLAgentBench, a suite of ML tasks for benchmarking AI research agents. Agents can perform actions like reading/writing files, executing code, and inspecting outputs. With these actions, agents could run experiments, analyze the results, and modify the code of entire machine learning pipelines, such as data processing, architecture, training processes, etc. The benchmark then automatically evaluates the agent's performance objectively over various metrics related to performance and efficiency. We also design an LLM-based research agent to automatically perform experimentation loops in such an environment. Empirically, we find that a GPT-4-based research agent can feasibly build compelling ML models over many tasks in MLAgentBench, displaying highly interpretable plans and actions. However, the success rates vary considerably; they span from almost 90\% on well-established older datasets to as low as 10\% on recent Kaggle Challenges -- unavailable during the LLM model's pretraining -- and even 0\% on newer research challenges like BabyLM. Finally, we identify several key challenges for LLM-based research agents such as long-term planning and hallucination. Our code is released at https://github.com/snap-stanford/MLAgentBench.
    Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance. (arXiv:2304.06715v3 [cs.LG] UPDATED)
    Interpretability methods are valuable only if their explanations faithfully describe the explained model. In this work, we consider neural networks whose predictions are invariant under a specific symmetry group. This includes popular architectures, ranging from convolutional to graph neural networks. Any explanation that faithfully explains this type of model needs to be in agreement with this invariance property. We formalize this intuition through the notion of explanation invariance and equivariance by leveraging the formalism from geometric deep learning. Through this rigorous formalism, we derive (1) two metrics to measure the robustness of any interpretability method with respect to the model symmetry group; (2) theoretical robustness guarantees for some popular interpretability methods and (3) a systematic approach to increase the invariance of any interpretability method with respect to a symmetry group. By empirically measuring our metrics for explanations of models associated with various modalities and symmetry groups, we derive a set of 5 guidelines to allow users and developers of interpretability methods to produce robust explanations.
    OpenPatch: a 3D patchwork for Out-Of-Distribution detectionpdf icon. (arXiv:2310.03388v1 [cs.CV])
    Moving deep learning models from the laboratory setting to the open world entails preparing them to handle unforeseen conditions. In several applications the occurrence of novel classes during deployment poses a significant threat, thus it is crucial to effectively detect them. Ideally, this skill should be used when needed without requiring any further computational training effort at every new task. Out-of-distribution detection has attracted significant attention in the last years, however the majority of the studies deal with 2D images ignoring the inherent 3D nature of the real-world and often confusing between domain and semantic novelty. In this work, we focus on the latter, considering the objects geometric structure captured by 3D point clouds regardless of the specific domain. We advance the field by introducing OpenPatch that builds on a large pre-trained model and simply extracts from its intermediate features a set of patch representations that describe each known class. For any new sample, we obtain a novelty score by evaluating whether it can be recomposed mainly by patches of a single known class or rather via the contribution of multiple classes. We present an extensive experimental evaluation of our approach for the task of semantic novelty detection on real-world point cloud samples when the reference known data are synthetic. We demonstrate that OpenPatch excels in both the full and few-shot known sample scenarios, showcasing its robustness across varying pre-training objectives and network backbones. The inherent training-free nature of our method allows for its immediate application to a wide array of real-world tasks, offering a compelling advantage over approaches that need expensive retraining efforts.
    A Framework for Large Scale Synthetic Graph Dataset Generation. (arXiv:2210.01944v4 [cs.LG] UPDATED)
    Recently there has been increasing interest in developing and deploying deep graph learning algorithms for many tasks, such as fraud detection and recommender systems. Albeit, there is a limited number of publicly available graph-structured datasets, most of which are tiny compared to production-sized applications or are limited in their application domain. This work tackles this shortcoming by proposing a scalable synthetic graph generation tool to scale the datasets to production-size graphs with trillions of edges and billions of nodes. The tool learns a series of parametric models from proprietary datasets that can be released to researchers to study various graph methods on the synthetic data increasing prototype development and novel applications. We demonstrate the generalizability of the framework across a series of datasets, mimicking structural and feature distributions as well as the ability to scale them across varying sizes demonstrating their usefulness for benchmarking and model development. Code can be found on https://github.com/NVIDIA/DeepLearningExamples/tree/master/Tools/DGLPyTorch/SyntheticGraphGeneration.
    Two-stage LLM Fine-tuning with Less Specialization and More Generalization. (arXiv:2211.00635v2 [cs.CL] UPDATED)
    Pretrained large language models (LLMs) are general purpose problem solvers applicable to a diverse set of tasks with prompts. They can be further improved towards a specific task by fine-tuning on a specialized dataset. However, fine-tuning usually makes the model narrowly specialized on this dataset with reduced general in-context learning performances, which is undesirable whenever the fine-tuned model needs to handle additional tasks where no fine-tuning data is available. In this work, we first demonstrate that fine-tuning on a single task indeed decreases LLMs' general in-context learning performance. We discover one important cause of such forgetting, format specialization, where the model overfits to the format of the fine-tuned task. We further show that format specialization happens at the very beginning of fine-tuning. To solve this problem, we propose Prompt Tuning with MOdel Tuning (ProMoT), a simple yet effective two-stage fine-tuning framework that reduces format specialization and improves generalization. ProMoT offloads task-specific format learning into additional and removable parameters by first doing prompt tuning and then fine-tuning the model itself with this soft prompt attached. With experiments on several fine-tuning tasks and 8 in-context evaluation tasks, we show that ProMoT achieves comparable performance on fine-tuned tasks to standard fine-tuning, but with much less loss of in-context learning performances across a board range of out-of-domain evaluation tasks. More importantly, ProMoT can even enhance generalization on in-context learning tasks that are semantically related to the fine-tuned task, e.g. ProMoT on En-Fr translation significantly improves performance on other language pairs, and ProMoT on NLI improves performance on summarization. Experiments also show that ProMoT can improve the generalization performance of multi-task training.
    Enhancing Adversarial Robustness via Score-Based Optimization. (arXiv:2307.04333v2 [cs.LG] UPDATED)
    Adversarial attacks have the potential to mislead deep neural network classifiers by introducing slight perturbations. Developing algorithms that can mitigate the effects of these attacks is crucial for ensuring the safe use of artificial intelligence. Recent studies have suggested that score-based diffusion models are effective in adversarial defenses. However, existing diffusion-based defenses rely on the sequential simulation of the reversed stochastic differential equations of diffusion models, which are computationally inefficient and yield suboptimal results. In this paper, we introduce a novel adversarial defense scheme named ScoreOpt, which optimizes adversarial samples at test-time, towards original clean data in the direction guided by score-based priors. We conduct comprehensive experiments on multiple datasets, including CIFAR10, CIFAR100 and ImageNet. Our experimental results demonstrate that our approach outperforms existing adversarial defenses in terms of both robustness performance and inference speed.
    Benchmarking Local Robustness of High-Accuracy Binary Neural Networks for Enhanced Traffic Sign Recognition. (arXiv:2310.03033v1 [cs.CV])
    Traffic signs play a critical role in road safety and traffic management for autonomous driving systems. Accurate traffic sign classification is essential but challenging due to real-world complexities like adversarial examples and occlusions. To address these issues, binary neural networks offer promise in constructing classifiers suitable for resource-constrained devices. In our previous work, we proposed high-accuracy BNN models for traffic sign recognition, focusing on compact size for limited computation and energy resources. To evaluate their local robustness, this paper introduces a set of benchmark problems featuring layers that challenge state-of-the-art verification tools. These layers include binarized convolutions, max pooling, batch normalization, fully connected. The difficulty of the verification problem is given by the high number of network parameters (905k - 1.7 M), of the input dimension (2.7k-12k), and of the number of regions (43) as well by the fact that the neural networks are not sparse. The proposed BNN models and local robustness properties can be checked at https://github.com/ChristopherBrix/vnncomp2023_benchmarks/tree/main/benchmarks/traffic_signs_recognition. The results of the 4th International Verification of Neural Networks Competition (VNN-COMP'23) revealed the fact that 4, out of 7, solvers can handle many of our benchmarks randomly selected (minimum is 6, maximum is 36, out of 45). Surprisingly, tools output also wrong results or missing counterexample (ranging from 1 to 4). Currently, our focus lies in exploring the possibility of achieving a greater count of solved instances by extending the allotted time (previously set at 8 minutes). Furthermore, we are intrigued by the reasons behind the erroneous outcomes provided by the tools for certain benchmarks.
    Comparing Time-Series Analysis Approaches Utilized in Research Papers to Forecast COVID-19 Cases in Africa: A Literature Review. (arXiv:2310.03606v1 [cs.LG])
    This literature review aimed to compare various time-series analysis approaches utilized in forecasting COVID-19 cases in Africa. The study involved a methodical search for English-language research papers published between January 2020 and July 2023, focusing specifically on papers that utilized time-series analysis approaches on COVID-19 datasets in Africa. A variety of databases including PubMed, Google Scholar, Scopus, and Web of Science were utilized for this process. The research papers underwent an evaluation process to extract relevant information regarding the implementation and performance of the time-series analysis models. The study highlighted the different methodologies employed, evaluating their effectiveness and limitations in forecasting the spread of the virus. The result of this review could contribute deeper insights into the field, and future research should consider these insights to improve time series analysis models and explore the integration of different approaches for enhanced public health decision-making.
    Sparse Deep Learning for Time Series Data: Theory and Applications. (arXiv:2310.03243v1 [stat.ML])
    Sparse deep learning has become a popular technique for improving the performance of deep neural networks in areas such as uncertainty quantification, variable selection, and large-scale network compression. However, most existing research has focused on problems where the observations are independent and identically distributed (i.i.d.), and there has been little work on the problems where the observations are dependent, such as time series data and sequential data in natural language processing. This paper aims to address this gap by studying the theory for sparse deep learning with dependent data. We show that sparse recurrent neural networks (RNNs) can be consistently estimated, and their predictions are asymptotically normally distributed under appropriate assumptions, enabling the prediction uncertainty to be correctly quantified. Our numerical results show that sparse deep learning outperforms state-of-the-art methods, such as conformal predictions, in prediction uncertainty quantification for time series data. Furthermore, our results indicate that the proposed method can consistently identify the autoregressive order for time series data and outperform existing methods in large-scale model compression. Our proposed method has important practical implications in fields such as finance, healthcare, and energy, where both accurate point estimates and prediction uncertainty quantification are of concern.
    Robust Representation Learning via Asymmetric Negative Contrast and Reverse Attention. (arXiv:2310.03358v1 [cs.CV])
    Deep neural networks are vulnerable to adversarial noise. Adversarial training (AT) has been demonstrated to be the most effective defense strategy to protect neural networks from being fooled. However, we find AT omits to learning robust features, resulting in poor performance of adversarial robustness. To address this issue, we highlight two characteristics of robust representation: (1) $\bf{exclusion}$: the feature of natural examples keeps away from that of other classes; (2) $\bf{alignment}$: the feature of natural and corresponding adversarial examples is close to each other. These motivate us to propose a generic framework of AT to gain robust representation, by the asymmetric negative contrast and reverse attention. Specifically, we design an asymmetric negative contrast based on predicted probabilities, to push away examples of different classes in the feature space. Moreover, we propose to weight feature by parameters of the linear classifier as the reverse attention, to obtain class-aware feature and pull close the feature of the same class. Empirical evaluations on three benchmark datasets show our methods greatly advance the robustness of AT and achieve state-of-the-art performance. Code is available at .
    Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training. (arXiv:2110.14883v3 [cs.LG] UPDATED)
    The success of Transformer models has pushed the deep learning model scale to billions of parameters. Due to the limited memory resource of a single GPU, However, the best practice for choosing the optimal parallel strategy is still lacking, since it requires domain expertise in both deep learning and parallel computing. The Colossal-AI system addressed the above challenge by introducing a unified interface to scale your sequential code of model training to distributed environments. It supports parallel training methods such as data, pipeline, tensor, and sequence parallelism, as well as heterogeneous training methods integrated with zero redundancy optimizer. Compared to the baseline system, Colossal-AI can achieve up to 2.76 times training speedup on large-scale models.
    Interpolating between Clustering and Dimensionality Reduction with Gromov-Wasserstein. (arXiv:2310.03398v1 [cs.LG])
    We present a versatile adaptation of existing dimensionality reduction (DR) objectives, enabling the simultaneous reduction of both sample and feature sizes. Correspondances between input and embedding samples are computed through a semi-relaxed Gromov-Wasserstein optimal transport (OT) problem. When the embedding sample size matches that of the input, our model recovers classical popular DR models. When the embedding's dimensionality is unconstrained, we show that the OT plan delivers a competitive hard clustering. We emphasize the importance of intermediate stages that blend DR and clustering for summarizing real data and apply our method to visualize datasets of images.
    Multi-Resolution Audio-Visual Feature Fusion for Temporal Action Localization. (arXiv:2310.03456v1 [cs.CV])
    Temporal Action Localization (TAL) aims to identify actions' start, end, and class labels in untrimmed videos. While recent advancements using transformer networks and Feature Pyramid Networks (FPN) have enhanced visual feature recognition in TAL tasks, less progress has been made in the integration of audio features into such frameworks. This paper introduces the Multi-Resolution Audio-Visual Feature Fusion (MRAV-FF), an innovative method to merge audio-visual data across different temporal resolutions. Central to our approach is a hierarchical gated cross-attention mechanism, which discerningly weighs the importance of audio information at diverse temporal scales. Such a technique not only refines the precision of regression boundaries but also bolsters classification confidence. Importantly, MRAV-FF is versatile, making it compatible with existing FPN TAL architectures and offering a significant enhancement in performance when audio data is available.
    FedHyper: A Universal and Robust Learning Rate Scheduler for Federated Learning with Hypergradient Descent. (arXiv:2310.03156v1 [cs.LG])
    The theoretical landscape of federated learning (FL) undergoes rapid evolution, but its practical application encounters a series of intricate challenges, and hyperparameter optimization is one of these critical challenges. Amongst the diverse adjustments in hyperparameters, the adaptation of the learning rate emerges as a crucial component, holding the promise of significantly enhancing the efficacy of FL systems. In response to this critical need, this paper presents FedHyper, a novel hypergradient-based learning rate adaptation algorithm specifically designed for FL. FedHyper serves as a universal learning rate scheduler that can adapt both global and local rates as the training progresses. In addition, FedHyper not only showcases unparalleled robustness to a spectrum of initial learning rate configurations but also significantly alleviates the necessity for laborious empirical learning rate adjustments. We provide a comprehensive theoretical analysis of FedHyper's convergence rate and conduct extensive experiments on vision and language benchmark datasets. The results demonstrate that FEDHYPER consistently converges 1.1-3x faster than FedAvg and the competing baselines while achieving superior final accuracy. Moreover, FedHyper catalyzes a remarkable surge in accuracy, augmenting it by up to 15% compared to FedAvg under suboptimal initial learning rate settings.
    On the definition of toxicity in NLP. (arXiv:2310.02357v2 [cs.CL] UPDATED)
    The fundamental problem in toxicity detection task lies in the fact that the toxicity is ill-defined. This causes us to rely on subjective and vague data in models' training, which results in non-robust and non-accurate results: garbage in - garbage out. This work suggests a new, stress-level-based definition of toxicity designed to be objective and context-aware. On par with it, we also describe possible ways of applying this new definition to dataset creation and model training.
    Towards Understanding the Effect of Pretraining Label Granularity. (arXiv:2303.16887v2 [cs.CV] UPDATED)
    In this paper, we study how the granularity of pretraining labels affects the generalization of deep neural networks in image classification tasks. We focus on the "fine-to-coarse" transfer learning setting, where the pretraining label space is more fine-grained than that of the target problem. Empirically, we show that pretraining on the leaf labels of ImageNet21k produces better transfer results on ImageNet1k than pretraining on other coarser granularity levels, which supports the common practice used in the community. Theoretically, we explain the benefit of fine-grained pretraining by proving that, for a data distribution satisfying certain hierarchy conditions, 1) coarse-grained pretraining only allows a neural network to learn the "common" or "easy-to-learn" features well, while 2) fine-grained pretraining helps the network learn the "rarer" or "fine-grained" features in addition to the common ones, thus improving its accuracy on hard downstream test samples in which common features are missing or weak in strength. Furthermore, we perform comprehensive experiments using the label hierarchies of iNaturalist 2021 and observe that the following conditions, in addition to proper choice of label granularity, enable the transfer to work well in practice: 1) the pretraining dataset needs to have a meaningful label hierarchy, and 2) the pretraining and target label functions need to align well.
    Learning Energy Decompositions for Partial Inference of GFlowNets. (arXiv:2310.03301v1 [cs.LG])
    This paper studies generative flow networks (GFlowNets) to sample objects from the Boltzmann energy distribution via a sequence of actions. In particular, we focus on improving GFlowNet with partial inference: training flow functions with the evaluation of the intermediate states or transitions. To this end, the recently developed forward-looking GFlowNet reparameterizes the flow functions based on evaluating the energy of intermediate states. However, such an evaluation of intermediate energies may (i) be too expensive or impossible to evaluate and (ii) even provide misleading training signals under large energy fluctuations along the sequence of actions. To resolve this issue, we propose learning energy decompositions for GFlowNets (LED-GFN). Our main idea is to (i) decompose the energy of an object into learnable potential functions defined on state transitions and (ii) reparameterize the flow functions using the potential functions. In particular, to produce informative local credits, we propose to regularize the potential to change smoothly over the sequence of actions. It is also noteworthy that training GFlowNet with our learned potential can preserve the optimal policy. We empirically verify the superiority of LED-GFN in five problems including the generation of unstructured and maximum independent sets, molecular graphs, and RNA sequences.
    Deep Ridgelet Transform: Voice with Koopman Operator Proves Universality of Formal Deep Networks. (arXiv:2310.03529v1 [cs.LG])
    We identify hidden layers inside a DNN with group actions on the data space, and formulate the DNN as a dual voice transform with respect to Koopman operator, a linear representation of the group action. Based on the group theoretic arguments, particularly by using Schur's lemma, we show a simple proof of the universality of those DNNs.
    Relational Convolutional Networks: A framework for learning representations of hierarchical relations. (arXiv:2310.03240v1 [cs.LG])
    A maturing area of research in deep learning is the development of architectures that can learn explicit representations of relational features. In this paper, we focus on the problem of learning representations of hierarchical relations, proposing an architectural framework we call "relational convolutional networks". Given a sequence of objects, a "multi-dimensional inner product relation" module produces a relation tensor describing all pairwise relations. A "relational convolution" layer then transforms the relation tensor into a sequence of new objects, each describing the relations within some group of objects at the previous layer. Graphlet filters, analogous to filters in convolutional neural networks, represent a template of relations against which the relation tensor is compared at each grouping. Repeating this yields representations of higher-order, hierarchical relations. We present the motivation and details of the architecture, together with a set of experiments to demonstrate how relational convolutional networks can provide an effective framework for modeling relational tasks that have hierarchical structure.
    The Geometric Structure of Fully-Connected ReLU-Layers. (arXiv:2310.03482v1 [cs.LG])
    We formalize and interpret the geometric structure of $d$-dimensional fully connected ReLU-layers in neural networks. The parameters of a ReLU-layer induce a natural partition of the input domain, such that in each sector of the partition, the ReLU-layer can be greatly simplified. This leads to a geometric interpretation of a ReLU-layer as a projection onto a polyhedral cone followed by an affine transformation, in line with the description in [doi:10.48550/arXiv.1905.08922] for convolutional networks with ReLU activations. Further, this structure facilitates simplified expressions for preimages of the intersection between partition sectors and hyperplanes, which is useful when describing decision boundaries in a classification setting. We investigate this in detail for a feed-forward network with one hidden ReLU-layer, where we provide results on the geometric complexity of the decision boundary generated by such networks, as well as proving that modulo an affine transformation, such a network can only generate $d$ different decision boundaries. Finally, the effect of adding more layers to the network is discussed.
    Towards out-of-distribution generalizable predictions of chemical kinetics properties. (arXiv:2310.03152v1 [cs.LG])
    Machine Learning (ML) techniques have found applications in estimating chemical kinetics properties. With the accumulated drug molecules identified through "AI4drug discovery", the next imperative lies in AI-driven design for high-throughput chemical synthesis processes, with the estimation of properties of unseen reactions with unexplored molecules. To this end, the existing ML approaches for kinetics property prediction are required to be Out-Of-Distribution (OOD) generalizable. In this paper, we categorize the OOD kinetic property prediction into three levels (structure, condition, and mechanism), revealing unique aspects of such problems. Under this framework, we create comprehensive datasets to benchmark (1) the state-of-the-art ML approaches for reaction prediction in the OOD setting and (2) the state-of-the-art graph OOD methods in kinetics property prediction problems. Our results demonstrated the challenges and opportunities in OOD kinetics property prediction. Our datasets and benchmarks can further support research in this direction.
    Leveraging Model-based Trees as Interpretable Surrogate Models for Model Distillation. (arXiv:2310.03112v1 [stat.ML])
    Surrogate models play a crucial role in retrospectively interpreting complex and powerful black box machine learning models via model distillation. This paper focuses on using model-based trees as surrogate models which partition the feature space into interpretable regions via decision rules. Within each region, interpretable models based on additive main effects are used to approximate the behavior of the black box model, striking for an optimal balance between interpretability and performance. Four model-based tree algorithms, namely SLIM, GUIDE, MOB, and CTree, are compared regarding their ability to generate such surrogate models. We investigate fidelity, interpretability, stability, and the algorithms' capability to capture interaction effects through appropriate splits. Based on our comprehensive analyses, we finally provide an overview of user-specific recommendations.
    Mitigating Pilot Contamination and Enabling IoT Scalability in Massive MIMO Systems. (arXiv:2310.03278v1 [cs.IT])
    Massive MIMO is expected to play an important role in the development of 5G networks. This paper addresses the issue of pilot contamination and scalability in massive MIMO systems. The current practice of reusing orthogonal pilot sequences in adjacent cells leads to difficulty in differentiating incoming inter- and intra-cell pilot sequences. One possible solution is to increase the number of orthogonal pilot sequences, which results in dedicating more space of coherence block to pilot transmission than data transmission. This, in turn, also hinders the scalability of massive MIMO systems, particularly in accommodating a large number of IoT devices within a cell. To overcome these challenges, this paper devises an innovative pilot allocation scheme based on the data transfer patterns of IoT devices. The scheme assigns orthogonal pilot sequences to clusters of devices instead of individual devices, allowing multiple devices to utilize the same pilot for periodically transmitting data. Moreover, we formulate the pilot assignment problem as a graph coloring problem and use the max k-cut graph partitioning approach to overcome the pilot contamination in a multicell massive MIMO system. The proposed scheme significantly improves the spectral efficiency and enables the scalability of massive MIMO systems; for instance, by using ten orthogonal pilot sequences, we are able to accommodate 200 devices with only a 12.5% omission rate.
    Molecule Design by Latent Prompt Transformer. (arXiv:2310.03253v1 [cs.LG])
    This paper proposes a latent prompt Transformer model for solving challenging optimization problems such as molecule design, where the goal is to find molecules with optimal values of a target chemical or biological property that can be computed by an existing software. Our proposed model consists of three components. (1) A latent vector whose prior distribution is modeled by a Unet transformation of a Gaussian white noise vector. (2) A molecule generation model that generates the string-based representation of molecule conditional on the latent vector in (1). We adopt the causal Transformer model that takes the latent vector in (1) as prompt. (3) A property prediction model that predicts the value of the target property of a molecule based on a non-linear regression on the latent vector in (1). We call the proposed model the latent prompt Transformer model. After initial training of the model on existing molecules and their property values, we then gradually shift the model distribution towards the region that supports desired values of the target property for the purpose of molecule design. Our experiments show that our proposed model achieves state of the art performances on several benchmark molecule design tasks.
    Detecting Electricity Service Equity Issues with Transfer Counterfactual Learning on Large-Scale Outage Datasets. (arXiv:2310.03258v1 [cs.LG])
    Energy justice is a growing area of interest in interdisciplinary energy research. However, identifying systematic biases in the energy sector remains challenging due to confounding variables, intricate heterogeneity in treatment effects, and limited data availability. To address these challenges, we introduce a novel approach for counterfactual causal analysis centered on energy justice. We use subgroup analysis to manage diverse factors and leverage the idea of transfer learning to mitigate data scarcity in each subgroup. In our numerical analysis, we apply our method to a large-scale customer-level power outage data set and investigate the counterfactual effect of demographic factors, such as income and age of the population, on power outage durations. Our results indicate that low-income and elderly-populated areas consistently experience longer power outages, regardless of weather conditions. This points to existing biases in the power system and highlights the need for focused improvements in areas with economic challenges.
    Raze to the Ground: Query-Efficient Adversarial HTML Attacks on Machine-Learning Phishing Webpage Detectors. (arXiv:2310.03166v1 [cs.CR])
    Machine-learning phishing webpage detectors (ML-PWD) have been shown to suffer from adversarial manipulations of the HTML code of the input webpage. Nevertheless, the attacks recently proposed have demonstrated limited effectiveness due to their lack of optimizing the usage of the adopted manipulations, and they focus solely on specific elements of the HTML code. In this work, we overcome these limitations by first designing a novel set of fine-grained manipulations which allow to modify the HTML code of the input phishing webpage without compromising its maliciousness and visual appearance, i.e., the manipulations are functionality- and rendering-preserving by design. We then select which manipulations should be applied to bypass the target detector by a query-efficient black-box optimization algorithm. Our experiments show that our attacks are able to raze to the ground the performance of current state-of-the-art ML-PWD using just 30 queries, thus overcoming the weaker attacks developed in previous work, and enabling a much fairer robustness evaluation of ML-PWD.
    History Matching for Geological Carbon Storage using Data-Space Inversion with Spatio-Temporal Data Parameterization. (arXiv:2310.03228v1 [cs.LG])
    History matching based on monitoring data will enable uncertainty reduction, and thus improved aquifer management, in industrial-scale carbon storage operations. In traditional model-based data assimilation, geomodel parameters are modified to force agreement between flow simulation results and observations. In data-space inversion (DSI), history-matched quantities of interest, e.g., posterior pressure and saturation fields conditioned to observations, are inferred directly, without constructing posterior geomodels. This is accomplished efficiently using a set of O(1000) prior simulation results, data parameterization, and posterior sampling within a Bayesian setting. In this study, we develop and implement (in DSI) a deep-learning-based parameterization to represent spatio-temporal pressure and CO2 saturation fields at a set of time steps. The new parameterization uses an adversarial autoencoder (AAE) for dimension reduction and a convolutional long short-term memory (convLSTM) network to represent the spatial distribution and temporal evolution of the pressure and saturation fields. This parameterization is used with an ensemble smoother with multiple data assimilation (ESMDA) in the DSI framework to enable posterior predictions. A realistic 3D system characterized by prior geological realizations drawn from a range of geological scenarios is considered. A local grid refinement procedure is introduced to estimate the error covariance term that appears in the history matching formulation. Extensive history matching results are presented for various quantities, for multiple synthetic true models. Substantial uncertainty reduction in posterior pressure and saturation fields is achieved in all cases. The framework is applied to efficiently provide posterior predictions for a range of error covariance specifications. Such an assessment would be expensive using a model-based approach.
    Maximum Likelihood Estimation of Latent Variable Structural Equation Models: A Neural Network Approach. (arXiv:2309.14073v2 [stat.ML] UPDATED)
    We propose a graphical structure for structural equation models that is stable under marginalization under linearity and Gaussianity assumptions. We show that computing the maximum likelihood estimation of this model is equivalent to training a neural network. We implement a GPU-based algorithm that computes the maximum likelihood estimation of these models.
    Knowledge Distillation Under Ideal Joint Classifier Assumption. (arXiv:2304.11004v2 [cs.LG] UPDATED)
    Knowledge distillation constitutes a potent methodology for condensing substantial neural networks into more compact and efficient counterparts. Within this context, softmax regression representation learning serves as a widely embraced approach, leveraging a pre-established teacher network to guide the learning process of a diminutive student network. Notably, despite the extensive inquiry into the efficacy of softmax regression representation learning, the intricate underpinnings governing the knowledge transfer mechanism remain inadequately elucidated. This study introduces the 'Ideal Joint Classifier Knowledge Distillation' (IJCKD) framework, an overarching paradigm that not only furnishes a lucid and exhaustive comprehension of prevailing knowledge distillation techniques but also establishes a theoretical underpinning for prospective investigations. Employing mathematical methodologies derived from domain adaptation theory, this investigation conducts a comprehensive examination of the error boundary of the student network contingent upon the teacher network. Consequently, our framework facilitates efficient knowledge transference between teacher and student networks, thereby accommodating a diverse spectrum of applications.
    PoseAction: Action Recognition for Patients in the Ward using Deep Learning Approaches. (arXiv:2310.03288v1 [cs.CV])
    Real-time intelligent detection and prediction of subjects' behavior particularly their movements or actions is critical in the ward. This approach offers the advantage of reducing in-hospital care costs and improving the efficiency of healthcare workers, which is especially true for scenarios at night or during peak admission periods. Therefore, in this work, we propose using computer vision (CV) and deep learning (DL) methods for detecting subjects and recognizing their actions. We utilize OpenPose as an accurate subject detector for recognizing the positions of human subjects in the video stream. Additionally, we employ AlphAction's Asynchronous Interaction Aggregation (AIA) network to predict the actions of detected subjects. This integrated model, referred to as PoseAction, is proposed. At the same time, the proposed model is further trained to predict 12 common actions in ward areas, such as staggering, chest pain, and falling down, using medical-related video clips from the NTU RGB+D and NTU RGB+D 120 datasets. The results demonstrate that PoseAction achieves the highest classification mAP of 98.72% (IoU@0.5). Additionally, this study develops an online real-time mode for action recognition, which strongly supports the clinical translation of PoseAction. Furthermore, using OpenPose's function for recognizing face key points, we also implement face blurring, which is a practical solution to address the privacy protection concerns of patients and healthcare workers. Nevertheless, the training data for PoseAction is currently limited, particularly in terms of label diversity. Consequently, the subsequent step involves utilizing a more diverse dataset (including general actions) to train the model's parameters for improved generalization.
    Memoria: Hebbian Memory Architecture for Human-Like Sequential Processing. (arXiv:2310.03052v1 [cs.LG])
    Transformers have demonstrated their success in various domains and tasks. However, Transformers struggle with long input sequences due to their limited capacity. While one solution is to increase input length, endlessly stretching the length is unrealistic. Furthermore, humans selectively remember and use only relevant information from inputs, unlike Transformers which process all raw data from start to end. We introduce Memoria, a general memory network that applies Hebbian theory which is a major theory explaining human memory formulation to enhance long-term dependencies in neural networks. Memoria stores and retrieves information called engram at multiple memory levels of working memory, short-term memory, and long-term memory, using connection weights that change according to Hebb's rule. Through experiments with popular Transformer-based models like BERT and GPT, we present that Memoria significantly improves the ability to consider long-term dependencies in various tasks. Results show that Memoria outperformed existing methodologies in sorting and language modeling, and long text classification.
    DP-SGD for non-decomposable objective functions. (arXiv:2310.03104v1 [cs.LG])
    Unsupervised pre-training is a common step in developing computer vision models and large language models. In this setting, the absence of labels requires the use of similarity-based loss functions, such as contrastive loss, that favor minimizing the distance between similar inputs and maximizing the distance between distinct inputs. As privacy concerns mount, training these models using differential privacy has become more important. However, due to how inputs are generated for these losses, one of their undesirable properties is that their $L_2$ sensitivity can grow with increasing batch size. This property is particularly disadvantageous for differentially private training methods, such as DP-SGD. To overcome this issue, we develop a new DP-SGD variant for similarity based loss functions -- in particular the commonly used contrastive loss -- that manipulates gradients of the objective function in a novel way to obtain a senstivity of the summed gradient that is $O(1)$ for batch size $n$. We test our DP-SGD variant on some preliminary CIFAR-10 pre-training and CIFAR-100 finetuning tasks and show that, in both tasks, our method's performance comes close to that of a non-private model and generally outperforms DP-SGD applied directly to the contrastive loss.
    How Prevalent is Gender Bias in ChatGPT? -- Exploring German and English ChatGPT Responses. (arXiv:2310.03031v1 [cs.CL])
    With the introduction of ChatGPT, OpenAI made large language models (LLM) accessible to users with limited IT expertise. However, users with no background in natural language processing (NLP) might lack a proper understanding of LLMs. Thus the awareness of their inherent limitations, and therefore will take the systems' output at face value. In this paper, we systematically analyse prompts and the generated responses to identify possible problematic issues with a special focus on gender biases, which users need to be aware of when processing the system's output. We explore how ChatGPT reacts in English and German if prompted to answer from a female, male, or neutral perspective. In an in-depth investigation, we examine selected prompts and analyse to what extent responses differ if the system is prompted several times in an identical way. On this basis, we show that ChatGPT is indeed useful for helping non-IT users draft texts for their daily work. However, it is absolutely crucial to thoroughly check the system's responses for biases as well as for syntactic and grammatical mistakes.
    Multi-modal Gaussian Process Variational Autoencoders for Neural and Behavioral Data. (arXiv:2310.03111v1 [cs.LG])
    Characterizing the relationship between neural population activity and behavioral data is a central goal of neuroscience. While latent variable models (LVMs) are successful in describing high-dimensional time-series data, they are typically only designed for a single type of data, making it difficult to identify structure shared across different experimental data modalities. Here, we address this shortcoming by proposing an unsupervised LVM which extracts temporally evolving shared and independent latents for distinct, simultaneously recorded experimental modalities. We do this by combining Gaussian Process Factor Analysis (GPFA), an interpretable LVM for neural spiking data with temporally smooth latent space, with Gaussian Process Variational Autoencoders (GP-VAEs), which similarly use a GP prior to characterize correlations in a latent space, but admit rich expressivity due to a deep neural network mapping to observations. We achieve interpretability in our model by partitioning latent variability into components that are either shared between or independent to each modality. We parameterize the latents of our model in the Fourier domain, and show improved latent identification using this approach over standard GP-VAE methods. We validate our model on simulated multi-modal data consisting of Poisson spike counts and MNIST images that scale and rotate smoothly over time. We show that the multi-modal GP-VAE (MM-GPVAE) is able to not only identify the shared and independent latent structure across modalities accurately, but provides good reconstructions of both images and neural rates on held-out trials. Finally, we demonstrate our framework on two real world multi-modal experimental settings: Drosophila whole-brain calcium imaging alongside tracked limb positions, and Manduca sexta spike train measurements from ten wing muscles as the animal tracks a visual stimulus.
    PDR-CapsNet: an Energy-Efficient Parallel Approach to Dynamic Routing in Capsule Networks. (arXiv:2310.03212v1 [cs.LG])
    Convolutional Neural Networks (CNNs) have produced state-of-the-art results for image classification tasks. However, they are limited in their ability to handle rotational and viewpoint variations due to information loss in max-pooling layers. Capsule Networks (CapsNets) employ a computationally-expensive iterative process referred to as dynamic routing to address these issues. CapsNets, however, often fall short on complex datasets and require more computational resources than CNNs. To overcome these challenges, we introduce the Parallel Dynamic Routing CapsNet (PDR-CapsNet), a deeper and more energy-efficient alternative to CapsNet that offers superior performance, less energy consumption, and lower overfitting rates. By leveraging a parallelization strategy, PDR-CapsNet mitigates the computational complexity of CapsNet and increases throughput, efficiently using hardware resources. As a result, we achieve 83.55\% accuracy while requiring 87.26\% fewer parameters, 32.27\% and 47.40\% fewer MACs, and Flops, achieving 3x faster inference and 7.29J less energy consumption on a 2080Ti GPU with 11GB VRAM compared to CapsNet and for the CIFAR-10 dataset.
    Regret Analysis of Distributed Online Control for LTI Systems with Adversarial Disturbances. (arXiv:2310.03206v1 [math.OC])
    This paper addresses the distributed online control problem over a network of linear time-invariant (LTI) systems (with possibly unknown dynamics) in the presence of adversarial perturbations. There exists a global network cost that is characterized by a time-varying convex function, which evolves in an adversarial manner and is sequentially and partially observed by local agents. The goal of each agent is to generate a control sequence that can compete with the best centralized control policy in hindsight, which has access to the global cost. This problem is formulated as a regret minimization. For the case of known dynamics, we propose a fully distributed disturbance feedback controller that guarantees a regret bound of $O(\sqrt{T}\log T)$, where $T$ is the time horizon. For the unknown dynamics case, we design a distributed explore-then-commit approach, where in the exploration phase all agents jointly learn the system dynamics, and in the learning phase our proposed control algorithm is applied using each agent system estimate. We establish a regret bound of $O(T^{2/3} \text{poly}(\log T))$ for this setting.
    Sharpness-Aware Minimization and the Edge of Stability. (arXiv:2309.12488v3 [cs.LG] UPDATED)
    Recent experiments have shown that, often, when training a neural network with gradient descent (GD) with a step size $\eta$, the operator norm of the Hessian of the loss grows until it approximately reaches $2/\eta$, after which it fluctuates around this value. The quantity $2/\eta$ has been called the "edge of stability" based on consideration of a local quadratic approximation of the loss. We perform a similar calculation to arrive at an "edge of stability" for Sharpness-Aware Minimization (SAM), a variant of GD which has been shown to improve its generalization. Unlike the case for GD, the resulting SAM-edge depends on the norm of the gradient. Using three deep learning training tasks, we see empirically that SAM operates on the edge of stability identified by this analysis.
    Synergistic Fusion of Graph and Transformer Features for Enhanced Molecular Property Prediction. (arXiv:2310.03027v1 [physics.chem-ph])
    Molecular property prediction is a critical task in computational drug discovery. While recent advances in Graph Neural Networks (GNNs) and Transformers have shown to be effective and promising, they face the following limitations: Transformer self-attention does not explicitly consider the underlying molecule structure while GNN feature representation alone is not sufficient to capture granular and hidden interactions and characteristics that distinguish similar molecules. To address these limitations, we propose SYN- FUSION, a novel approach that synergistically combines pre-trained features from GNNs and Transformers. This approach provides a comprehensive molecular representation, capturing both the global molecule structure and the individual atom characteristics. Experimental results on MoleculeNet benchmarks demonstrate superior performance, surpassing previous models in 5 out of 7 classification datasets and 4 out of 6 regression datasets. The performance of SYN-FUSION has been compared with other Graph-Transformer models that have been jointly trained using a combination of transformer and graph features, and it is found that our approach is on par with those models in terms of performance. Extensive analysis of the learned fusion model across aspects such as loss, latent space, and weight distribution further validates the effectiveness of SYN-FUSION. Finally, an ablation study unequivocally demonstrates that the synergy achieved by SYN-FUSION surpasses the performance of its individual model components and their ensemble, offering a substantial improvement in predicting molecular properties.
    Discovering Knowledge-Critical Subnetworks in Pretrained Language Models. (arXiv:2310.03084v1 [cs.CL])
    Pretrained language models (LMs) encode implicit representations of knowledge in their parameters. However, localizing these representations and disentangling them from each other remains an open problem. In this work, we investigate whether pretrained language models contain various knowledge-critical subnetworks: particular sparse computational subgraphs responsible for encoding specific knowledge the model has memorized. We propose a multi-objective differentiable weight masking scheme to discover these subnetworks and show that we can use them to precisely remove specific knowledge from models while minimizing adverse effects on the behavior of the original language model. We demonstrate our method on multiple GPT2 variants, uncovering highly sparse subnetworks (98%+) that are solely responsible for specific collections of relational knowledge. When these subnetworks are removed, the remaining network maintains most of its initial capacity (modeling language and other memorized relational knowledge) but struggles to express the removed knowledge, and suffers performance drops on examples needing this removed knowledge on downstream tasks after finetuning.
    Assessment of Prediction Intervals Using Uncertainty Characteristics Curves. (arXiv:2310.03158v1 [cs.LG])
    Accurate quantification of model uncertainty has long been recognized as a fundamental requirement for trusted AI. In regression tasks, uncertainty is typically quantified using prediction intervals calibrated to an ad-hoc operating point, making evaluation and comparison across different studies relatively difficult. Our work leverages: (1) the concept of operating characteristics curves and (2) the notion of a gain over a null reference, to derive a novel operating point agnostic assessment methodology for prediction intervals. The paper defines the Uncertainty Characteristics Curve and demonstrates its utility in selected scenarios. We argue that the proposed method addresses the current need for comprehensive assessment of prediction intervals and thus represents a valuable addition to the uncertainty quantification toolbox.
    Physics-Informed Neural Networks for Accelerating Power System State Estimation. (arXiv:2310.03088v1 [cs.LG])
    State estimation is the cornerstone of the power system control center since it provides the operating condition of the system in consecutive time intervals. This work investigates the application of physics-informed neural networks (PINNs) for accelerating power systems state estimation in monitoring the operation of power systems. Traditional state estimation techniques often rely on iterative algorithms that can be computationally intensive, particularly for large-scale power systems. In this paper, a novel approach that leverages the inherent physical knowledge of power systems through the integration of PINNs is proposed. By incorporating physical laws as prior knowledge, the proposed method significantly reduces the computational complexity associated with state estimation while maintaining high accuracy. The proposed method achieves up to 11% increase in accuracy, 75% reduction in standard deviation of results, and 30% faster convergence, as demonstrated by comprehensive experiments on the IEEE 14-bus system.
    Creating an Atlas of Normal Tissue for Pruning WSI Patching Through Anomaly Detection. (arXiv:2310.03106v1 [eess.IV])
    Patching gigapixel whole slide images (WSIs) is an important task in computational pathology. Some methods have been proposed to select a subset of patches as WSI representation for downstream tasks. While most of the computational pathology tasks are designed to classify or detect the presence of pathological lesions in each WSI, the confounding role and redundant nature of normal histology in tissue samples are generally overlooked in WSI representations. In this paper, we propose and validate the concept of an "atlas of normal tissue" solely using samples of WSIs obtained from normal tissue biopsies. Such atlases can be employed to eliminate normal fragments of tissue samples and hence increase the representativeness collection of patches. We tested our proposed method by establishing a normal atlas using 107 normal skin WSIs and demonstrated how established indexes and search engines like Yottixel can be improved. We used 553 WSIs of cutaneous squamous cell carcinoma (cSCC) to show the advantage. We also validated our method applied to an external dataset of 451 breast WSIs. The number of selected WSI patches was reduced by 30% to 50% after utilizing the proposed normal atlas while maintaining the same indexing and search performance in leave-one-patinet-out validation for both datasets. We show that the proposed normal atlas shows promise for unsupervised selection of the most representative patches of the abnormal/malignant WSI lesions.
    Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning. (arXiv:2310.03094v1 [cs.CL])
    Large language models (LLMs) such as GPT-4 have exhibited remarkable performance in a variety of tasks, but this strong performance often comes with the high expense of using paid API services. In this paper, we are motivated to study building an LLM cascade to save the cost of using LLMs, particularly for performing reasoning (e.g., mathematical, causal) tasks. Our cascade pipeline follows the intuition that simpler questions can be addressed by a weaker but more affordable LLM, whereas only the challenging questions necessitate the stronger and more expensive LLM. To realize this decision-making, we consider the "answer consistency" of the weaker LLM as a signal of the question difficulty and propose several methods for the answer sampling and consistency checking, including one leveraging a mixture of two thought representations (i.e., Chain-of-Thought and Program-of-Thought). Through experiments on six reasoning benchmark datasets, with GPT-3.5-turbo and GPT-4 being the weaker and stronger LLMs, respectively, we demonstrate that our proposed LLM cascades can achieve performance comparable to using solely the stronger LLM but require only 40% of its cost.
    QuATON: Quantization Aware Training of Optical Neurons. (arXiv:2310.03049v1 [cs.LG])
    Optical neural architectures (ONAs) use coding elements with optimized physical parameters to perform intelligent measurements. However, fabricating ONAs while maintaining design performances is challenging. Limitations in fabrication techniques often limit the realizable precision of the trained parameters. Physical constraints may also limit the range of values the physical parameters can hold. Thus, ONAs should be trained within the implementable constraints. However, such physics-based constraints reduce the training objective to a constrained optimization problem, making it harder to optimize with existing gradient-based methods. To alleviate these critical issues that degrade performance from simulation to realization we propose a physics-informed quantization-aware training framework. Our approach accounts for the physical constraints during the training process, leading to robust designs. We evaluate our approach on an ONA proposed in the literature, named a diffractive deep neural network (D2NN), for all-optical phase imaging and for classification of phase objects. With extensive experiments on different quantization levels and datasets, we show that our approach leads to ONA designs that are robust to quantization noise.
    Enhancing Accuracy in Deep Learning Using Random Matrix Theory. (arXiv:2310.03165v1 [cs.LG])
    In this study, we explore the applications of random matrix theory (RMT) in the training of deep neural networks (DNNs), focusing on layer pruning to simplify DNN architecture and loss landscape. RMT, recently used to address overfitting in deep learning, enables the examination of DNN's weight layer spectra. We use these techniques to optimally determine the number of singular values to be removed from the weight layers of a DNN during training via singular value decomposition (SVD). This process aids in DNN simplification and accuracy enhancement, as evidenced by training simple DNN models on the MNIST and Fashion MNIST datasets. Our method can be applied to any fully connected or convolutional layer of a pretrained DNN, decreasing the layer's parameters and simplifying the DNN architecture while preserving or even enhancing the model's accuracy. By discarding small singular values based on RMT criteria, the accuracy of the test set remains consistent, facilitating more efficient DNN training without compromising performance. We provide both theoretical and empirical evidence supporting our claim that the elimination of small singular values based on RMT does not negatively impact the DNN's accuracy. Our results offer valuable insights into the practical application of RMT for the creation of more efficient and accurate deep-learning models.
    The Cadenza ICASSP 2024 Grand Challenge. (arXiv:2310.03480v1 [eess.AS])
    The Cadenza project aims to enhance the audio quality of music for individuals with hearing loss. As part of this, the project is organizing the ICASSP SP Cadenza Challenge: Music Demixing/Remixing for Hearing Aids. The challenge can be tackled by decomposing the music at the hearing aid microphones into vocals, bass, drums, and other components. These can then be intelligently remixed in a personalized manner to improve audio quality. Alternatively, an end-to-end approach could be used. Processes need to consider the music itself, the gain applied to each component, and the listener's hearing loss. The submitted entries will be evaluated using the intrusive objective metric, the Hearing Aid Audio Quality Index (HAAQI). This paper outlines the challenge.
    Safe Exploration in Reinforcement Learning: A Generalized Formulation and Algorithms. (arXiv:2310.03225v1 [cs.LG])
    Safe exploration is essential for the practical use of reinforcement learning (RL) in many real-world scenarios. In this paper, we present a generalized safe exploration (GSE) problem as a unified formulation of common safe exploration problems. We then propose a solution of the GSE problem in the form of a meta-algorithm for safe exploration, MASE, which combines an unconstrained RL algorithm with an uncertainty quantifier to guarantee safety in the current episode while properly penalizing unsafe explorations before actual safety violation to discourage them in future episodes. The advantage of MASE is that we can optimize a policy while guaranteeing with a high probability that no safety constraint will be violated under proper assumptions. Specifically, we present two variants of MASE with different constructions of the uncertainty quantifier: one based on generalized linear models with theoretical guarantees of safety and near-optimality, and another that combines a Gaussian process to ensure safety with a deep RL algorithm to maximize the reward. Finally, we demonstrate that our proposed algorithm achieves better performance than state-of-the-art algorithms on grid-world and Safety Gym benchmarks without violating any safety constraints, even during training.
    TacoGFN: Target Conditioned GFlowNet for Structure-Based Drug Design. (arXiv:2310.03223v1 [cs.LG])
    We seek to automate the generation of drug-like compounds conditioned to specific protein pocket targets. Most current methods approximate the protein-molecule distribution of a finite dataset and, therefore struggle to generate molecules with significant binding improvement over the training dataset. We instead frame the pocket-conditioned molecular generation task as an RL problem and develop TacoGFN, a target conditional Generative Flow Network model. Our method is explicitly encouraged to generate molecules with desired properties as opposed to fitting on a pre-existing data distribution. To this end, we develop transformer-based docking score prediction to speed up docking score computation and propose TacoGFN to explore molecule space efficiently. Furthermore, we incorporate several rounds of active learning where generated samples are queried using a docking oracle to improve the docking score prediction. This approach allows us to accurately explore as much of the molecule landscape as we can afford computationally. Empirically, molecules generated using TacoGFN and its variants significantly outperform all baseline methods across every property (Docking score, QED, SA, Lipinski), while being orders of magnitude faster.
    Formal and Practical Elements for the Certification of Machine Learning Systems. (arXiv:2310.03217v1 [cs.LG])
    Over the past decade, machine learning has demonstrated impressive results, often surpassing human capabilities in sensing tasks relevant to autonomous flight. Unlike traditional aerospace software, the parameters of machine learning models are not hand-coded nor derived from physics but learned from data. They are automatically adjusted during a training phase, and their values do not usually correspond to physical requirements. As a result, requirements cannot be directly traced to lines of code, hindering the current bottom-up aerospace certification paradigm. This paper attempts to address this gap by 1) demystifying the inner workings and processes to build machine learning models, 2) formally establishing theoretical guarantees given by those processes, and 3) complementing these formal elements with practical considerations to develop a complete certification argument for safety-critical machine learning systems. Based on a scalable statistical verifier, our proposed framework is model-agnostic and tool-independent, making it adaptable to many use cases in the industry. We demonstrate results on a widespread application in autonomous flight: vision-based landing.
    Fragment-based Pretraining and Finetuning on Molecular Graphs. (arXiv:2310.03274v1 [cs.LG])
    Property prediction on molecular graphs is an important application of Graph Neural Networks (GNNs). Recently, unlabeled molecular data has become abundant, which facilitates the rapid development of self-supervised learning for GNNs in the chemical domain. In this work, we propose pretraining GNNs at the fragment level, which serves as a promising middle ground to overcome the limitations of node-level and graph-level pretraining. Borrowing techniques from recent work on principle subgraph mining, we obtain a compact vocabulary of prevalent fragments that span a large pretraining dataset. From the extracted vocabulary, we introduce several fragment-based contrastive and predictive pretraining tasks. The contrastive learning task jointly pretrains two different GNNs: one based on molecular graphs and one based on fragment graphs, which represents high-order connectivity within molecules. By enforcing the consistency between the fragment embedding and the aggregated embedding of the corresponding atoms from the molecular graphs, we ensure that both embeddings capture structural information at multiple resolutions. The structural information of the fragment graphs is further exploited to extract auxiliary labels for the graph-level predictive pretraining. We employ both the pretrained molecular-based and fragment-based GNNs for downstream prediction, thus utilizing the fragment information during finetuning. Our models advance the performances on 5 out of 8 common molecular benchmarks and improve the performances on long-range biological benchmarks by at least 11.5%.
    Joint Group Invariant Functions on Data-Parameter Domain Induce Universal Neural Networks. (arXiv:2310.03530v1 [cs.LG])
    The symmetry and geometry of input data are considered to be encoded in the internal data representation inside the neural network, but the specific encoding rule has been less investigated. By focusing on a joint group invariant function on the data-parameter domain, we present a systematic rule to find a dual group action on the parameter domain from a group action on the data domain. Further, we introduce generalized neural networks induced from the joint invariant functions, and present a new group theoretic proof of their universality theorems by using Schur's lemma. Since traditional universality theorems were demonstrated based on functional analytical methods, this study sheds light on the group theoretic aspect of the approximation theory, connecting geometric deep learning to abstract harmonic analysis.
    GPT-MolBERTa: GPT Molecular Features Language Model for molecular property prediction. (arXiv:2310.03030v1 [physics.chem-ph])
    With the emergence of Transformer architectures and their powerful understanding of textual data, a new horizon has opened up to predict the molecular properties based on text description. While SMILES are the most common form of representation, they are lacking robustness, rich information and canonicity, which limit their effectiveness in becoming generalizable representations. Here, we present GPT-MolBERTa, a self-supervised large language model (LLM) which uses detailed textual descriptions of molecules to predict their properties. A text based description of 326000 molecules were collected using ChatGPT and used to train LLM to learn the representation of molecules. To predict the properties for the downstream tasks, both BERT and RoBERTa models were used in the finetuning stage. Experiments show that GPT-MolBERTa performs well on various molecule property benchmarks, and approaching state of the art performance in regression tasks. Additionally, further analysis of the attention mechanisms show that GPT-MolBERTa is able to pick up important information from the input textual data, displaying the interpretability of the model.
    A Deep Reinforcement Learning Approach for Interactive Search with Sentence-level Feedback. (arXiv:2310.03043v1 [cs.LG])
    Interactive search can provide a better experience by incorporating interaction feedback from the users. This can significantly improve search accuracy as it helps avoid irrelevant information and captures the users' search intents. Existing state-of-the-art (SOTA) systems use reinforcement learning (RL) models to incorporate the interactions but focus on item-level feedback, ignoring the fine-grained information found in sentence-level feedback. Yet such feedback requires extensive RL action space exploration and large amounts of annotated data. This work addresses these challenges by proposing a new deep Q-learning (DQ) approach, DQrank. DQrank adapts BERT-based models, the SOTA in natural language processing, to select crucial sentences based on users' engagement and rank the items to obtain more satisfactory responses. We also propose two mechanisms to better explore optimal actions. DQrank further utilizes the experience replay mechanism in DQ to store the feedback sentences to obtain a better initial ranking performance. We validate the effectiveness of DQrank on three search datasets. The results show that DQRank performs at least 12% better than the previous SOTA RL approaches. We also conduct detailed ablation studies. The ablation results demonstrate that each model component can efficiently extract and accumulate long-term engagement effects from the users' sentence-level feedback. This structure offers new technologies with promised performance to construct a search system with sentence-level interaction.
    Differentiable Chemical Physics by Geometric Deep Learning for Gradient-based Property Optimization of Mixtures. (arXiv:2310.03047v1 [physics.chem-ph])
    Chemical mixtures, satisfying multi-objective performance metrics and constraints, enable their use in chemical processes and electrochemical devices. In this work, we develop a differentiable chemical-physics framework for modeling chemical mixtures, DiffMix, where geometric deep learning (GDL) is leveraged to map from molecular species, compositions and environment conditions, to physical coefficients in the mixture physics laws. In particular, we extend mixture thermodynamic and transport laws by creating learnable physical coefficients, where we use graph neural networks as the molecule encoder and enforce component-wise permutation-invariance. We start our model evaluations with thermodynamics of binary mixtures, and further benchmarked multicomponent electrolyte mixtures on their transport properties, in order to test the model generalizability. We show improved prediction accuracy and model robustness of DiffMix than its purely data-driven variants. Furthermore, we demonstrate the efficient optimization of electrolyte transport properties, built on the gradient obtained using DiffMix auto-differentiation. Our simulation runs are then backed up by the data generated by a robotic experimentation setup, Clio. By combining mixture physics and GDL, DiffMix expands the predictive modeling methods for chemical mixtures and provides low-cost optimization approaches in large chemical spaces.
    Modified LAB Algorithm with Clustering-based Search Space Reduction Method for solving Engineering Design Problems. (arXiv:2310.03055v1 [cs.LG])
    A modified LAB algorithm is introduced in this paper. It builds upon the original LAB algorithm (Reddy et al. 2023), which is a socio-inspired algorithm that models competitive and learning behaviours within a group, establishing hierarchical roles. The proposed algorithm incorporates the roulette wheel approach and a reduction factor introducing inter-group competition and iteratively narrowing down the sample space. The algorithm is validated by solving the benchmark test problems from CEC 2005 and CEC 2017. The solutions are validated using standard statistical tests such as two-sided and pairwise signed rank Wilcoxon test and Friedman rank test. The algorithm exhibited improved and superior robustness as well as search space exploration capabilities. Furthermore, a Clustering-Based Search Space Reduction (C-SSR) method is proposed, making the algorithm capable to solve constrained problems. The C-SSR method enables the algorithm to identify clusters of feasible regions, satisfying the constraints and contributing to achieve the optimal solution. This method demonstrates its effectiveness as a potential alternative to traditional constraint handling techniques. The results obtained using the Modified LAB algorithm are then compared with those achieved by other recent metaheuristic algorithms.
    Test Case Recommendations with Distributed Representation of Code Syntactic Features. (arXiv:2310.03174v1 [cs.LG])
    Frequent modifications of unit test cases are inevitable due to software's continuous underlying changes in source code, design, and requirements. Since manually maintaining software test suites is tedious, timely, and costly, automating the process of generation and maintenance of test units will significantly impact the effectiveness and efficiency of software testing processes. To this end, we propose an automated approach which exploits both structural and semantic properties of source code methods and test cases to recommend the most relevant and useful unit tests to the developers. The proposed approach initially trains a neural network to transform method-level source code, as well as unit tests, into distributed representations (embedded vectors) while preserving the importance of the structure in the code. Retrieving the semantic and structural properties of a given method, the approach computes cosine similarity between the method's embedding and the previously-embedded training instances. Further, according to the similarity scores between the embedding vectors, the model identifies the closest methods of embedding and the associated unit tests as the most similar recommendations. The results on the Methods2Test dataset showed that, while there is no guarantee to have similar relevant test cases for the group of similar methods, the proposed approach extracts the most similar existing test cases for a given method in the dataset, and evaluations show that recommended test cases decrease the developers' effort to generating expected test cases.
    Know2BIO: A Comprehensive Dual-View Benchmark for Evolving Biomedical Knowledge Graphs. (arXiv:2310.03221v1 [cs.LG])
    Knowledge graphs (KGs) have emerged as a powerful framework for representing and integrating complex biomedical information. However, assembling KGs from diverse sources remains a significant challenge in several aspects, including entity alignment, scalability, and the need for continuous updates to keep pace with scientific advancements. Moreover, the representative power of KGs is often limited by the scarcity of multi-modal data integration. To overcome these challenges, we propose Know2BIO, a general-purpose heterogeneous KG benchmark for the biomedical domain. Know2BIO integrates data from 30 diverse sources, capturing intricate relationships across 11 biomedical categories. It currently consists of ~219,000 nodes and ~6,200,000 edges. Know2BIO is capable of user-directed automated updating to reflect the latest knowledge in biomedical science. Furthermore, Know2BIO is accompanied by multi-modal data: node features including text descriptions, protein and compound sequences and structures, enabling the utilization of emerging natural language processing methods and multi-modal data integration strategies. We evaluate KG representation models on Know2BIO, demonstrating its effectiveness as a benchmark for KG representation learning in the biomedical field. Data and source code of Know2BIO are available at https://github.com/Yijia-Xiao/Know2BIO/.
    Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models. (arXiv:2310.03059v1 [cs.CV])
    The popularity of pre-trained large models has revolutionized downstream tasks across diverse fields, such as language, vision, and multi-modality. To minimize the adaption cost for downstream tasks, many Parameter-Efficient Fine-Tuning (PEFT) techniques are proposed for language and 2D image pre-trained models. However, the specialized PEFT method for 3D pre-trained models is still under-explored. To this end, we introduce Point-PEFT, a novel framework for adapting point cloud pre-trained models with minimal learnable parameters. Specifically, for a pre-trained 3D model, we freeze most of its parameters, and only tune the newly added PEFT modules on downstream tasks, which consist of a Point-prior Prompt and a Geometry-aware Adapter. The Point-prior Prompt adopts a set of learnable prompt tokens, for which we propose to construct a memory bank with domain-specific knowledge, and utilize a parameter-free attention to enhance the prompt tokens. The Geometry-aware Adapter aims to aggregate point cloud features within spatial neighborhoods to capture fine-grained geometric information through local interactions. Extensive experiments indicate that our Point-PEFT can achieve better performance than the full fine-tuning on various downstream tasks, while using only 5% of the trainable parameters, demonstrating the efficiency and effectiveness of our approach. Code will be released at https://github.com/EvenJoker/Point-PEFT.
    Batch-less stochastic gradient descent for compressive learning of deep regularization for image denoising. (arXiv:2310.03085v1 [cs.LG])
    We consider the problem of denoising with the help of prior information taken from a database of clean signals or images. Denoising with variational methods is very efficient if a regularizer well adapted to the nature of the data is available. Thanks to the maximum a posteriori Bayesian framework, such regularizer can be systematically linked with the distribution of the data. With deep neural networks (DNN), complex distributions can be recovered from a large training database.To reduce the computational burden of this task, we adapt the compressive learning framework to the learning of regularizers parametrized by DNN. We propose two variants of stochastic gradient descent (SGD) for the recovery of deep regularization parameters from a heavily compressed database. These algorithms outperform the initially proposed method that was limited to low-dimensional signals, each iteration using information from the whole database. They also benefit from classical SGD convergence guarantees. Thanks to these improvements we show that this method can be applied for patch based image denoising.}
    Crossed-IoT device portability of Electromagnetic Side Channel Analysis: Challenges and Dataset. (arXiv:2310.03119v1 [cs.LG])
    IoT (Internet of Things) refers to the network of interconnected physical devices, vehicles, home appliances, and other items embedded with sensors, software, and connectivity, enabling them to collect and exchange data. IoT Forensics is collecting and analyzing digital evidence from IoT devices to investigate cybercrimes, security breaches, and other malicious activities that may have taken place on these connected devices. In particular, EM-SCA has become an essential tool for IoT forensics due to its ability to reveal confidential information about the internal workings of IoT devices without interfering these devices or wiretapping their networks. However, the accuracy and reliability of EM-SCA results can be limited by device variability, environmental factors, and data collection and processing methods. Besides, there is very few research on these limitations that affects significantly the accuracy of EM-SCA approaches for the crossed-IoT device portability as well as limited research on the possible solutions to address such challenge. Therefore, this empirical study examines the impact of device variability on the accuracy and reliability of EM-SCA approaches, in particular machine-learning (ML) based approaches for EM-SCA. We firstly presents the background, basic concepts and techniques used to evaluate the limitations of current EM-SCA approaches and datasets. Our study then addresses one of the most important limitation, which is caused by the multi-core architecture of the processors (SoC). We present an approach to collect the EM-SCA datasets and demonstrate the feasibility of using transfer learning to obtain more meaningful and reliable results from EM-SCA in IoT forensics of crossed-IoT devices. Our study moreover contributes a new dataset for using deep learning models in analysing Electromagnetic Side-Channel data with regards to the cross-device portability matter.
    BID-NeRF: RGB-D image pose estimation with inverted Neural Radiance Fields. (arXiv:2310.03563v1 [cs.CV])
    We aim to improve the Inverted Neural Radiance Fields (iNeRF) algorithm which defines the image pose estimation problem as a NeRF based iterative linear optimization. NeRFs are novel neural space representation models that can synthesize photorealistic novel views of real-world scenes or objects. Our contributions are as follows: we extend the localization optimization objective with a depth-based loss function, we introduce a multi-image based loss function where a sequence of images with known relative poses are used without increasing the computational complexity, we omit hierarchical sampling during volumetric rendering, meaning only the coarse model is used for pose estimation, and we how that by extending the sampling interval convergence can be achieved even or higher initial pose estimate errors. With the proposed modifications the convergence speed is significantly improved, and the basin of convergence is substantially extended.
    Learning Concept-Based Visual Causal Transition and Symbolic Reasoning for Visual Planning. (arXiv:2310.03325v1 [cs.AI])
    Visual planning simulates how humans make decisions to achieve desired goals in the form of searching for visual causal transitions between an initial visual state and a final visual goal state. It has become increasingly important in egocentric vision with its advantages in guiding agents to perform daily tasks in complex environments. In this paper, we propose an interpretable and generalizable visual planning framework consisting of i) a novel Substitution-based Concept Learner (SCL) that abstracts visual inputs into disentangled concept representations, ii) symbol abstraction and reasoning that performs task planning via the self-learned symbols, and iii) a Visual Causal Transition model (ViCT) that grounds visual causal transitions to semantically similar real-world actions. Given an initial state, we perform goal-conditioned visual planning with a symbolic reasoning method fueled by the learned representations and causal transitions to reach the goal state. To verify the effectiveness of the proposed model, we collect a large-scale visual planning dataset based on AI2-THOR, dubbed as CCTP. Extensive experiments on this challenging dataset demonstrate the superior performance of our method in visual task planning. Empirically, we show that our framework can generalize to unseen task trajectories and unseen object categories.
    LESSON: Learning to Integrate Exploration Strategies for Reinforcement Learning via an Option Framework. (arXiv:2310.03342v1 [cs.LG])
    In this paper, a unified framework for exploration in reinforcement learning (RL) is proposed based on an option-critic model. The proposed framework learns to integrate a set of diverse exploration strategies so that the agent can adaptively select the most effective exploration strategy over time to realize a relevant exploration-exploitation trade-off for each given task. The effectiveness of the proposed exploration framework is demonstrated by various experiments in the MiniGrid and Atari environments.
    Posterior Sampling Based on Gradient Flows of the MMD with Negative Distance Kernel. (arXiv:2310.03054v1 [stat.ML])
    We propose conditional flows of the maximum mean discrepancy (MMD) with the negative distance kernel for posterior sampling and conditional generative modeling. This MMD, which is also known as energy distance, has several advantageous properties like efficient computation via slicing and sorting. We approximate the joint distribution of the ground truth and the observations using discrete Wasserstein gradient flows and establish an error bound for the posterior distributions. Further, we prove that our particle flow is indeed a Wasserstein gradient flow of an appropriate functional. The power of our method is demonstrated by numerical examples including conditional image generation and inverse problems like superresolution, inpainting and computed tomography in low-dose and limited-angle settings.
    Fine-tune Language Models to Approximate Unbiased In-context Learning. (arXiv:2310.03331v1 [cs.LG])
    In-context learning (ICL) is an astonishing emergent ability of large language models (LLMs). By presenting a prompt that includes multiple input-output pairs as examples and introducing a new query input, models can generate the corresponding output. However, the performance of models heavily relies on the quality of the input prompt when implementing in-context learning. Biased or imbalanced input prompts can significantly degrade the performance of language models. To address this issue, we introduce a reweighted algorithm called RICL (Reweighted In-context Learning). This algorithm fine-tunes language models using an unbiased validation set to determine the optimal weight for each input-output example to approximate unbiased in-context learning. Furthermore, we also introduce a low-cost reweighted algorithm, a linear optimal weight approximation algorithm called LARICL (Linear Approximation of Reweighted In-context Learning). This algorithm requires minimal training cost while providing effective results. We prove the convergence of our algorithm and validate its performance through experiments conducted on a numerical dataset. The experimental findings reveal a substantial improvement in comparison to benchmarks including the performance of casual prompt-based in-context learning and the performance of a classic fine-tuning method.
    The Blame Problem in Evaluating Local Explanations, and How to Tackle it. (arXiv:2310.03466v1 [cs.LG])
    The number of local model-agnostic explanation techniques proposed has grown rapidly recently. One main reason is that the bar for developing new explainability techniques is low due to the lack of optimal evaluation measures. Without rigorous measures, it is hard to have concrete evidence of whether the new explanation techniques can significantly outperform their predecessors. Our study proposes a new taxonomy for evaluating local explanations: robustness, evaluation using ground truth from synthetic datasets and interpretable models, model randomization, and human-grounded evaluation. Using this proposed taxonomy, we highlight that all categories of evaluation methods, except those based on the ground truth from interpretable models, suffer from a problem we call the "blame problem." In our study, we argue that this category of evaluation measure is a more reasonable method for evaluating local model-agnostic explanations. However, we show that even this category of evaluation measures has further limitations. The evaluation of local explanations remains an open research problem.
    Certifiably Robust Graph Contrastive Learning. (arXiv:2310.03312v1 [cs.CR])
    Graph Contrastive Learning (GCL) has emerged as a popular unsupervised graph representation learning method. However, it has been shown that GCL is vulnerable to adversarial attacks on both the graph structure and node attributes. Although empirical approaches have been proposed to enhance the robustness of GCL, the certifiable robustness of GCL is still remain unexplored. In this paper, we develop the first certifiably robust framework in GCL. Specifically, we first propose a unified criteria to evaluate and certify the robustness of GCL. We then introduce a novel technique, RES (Randomized Edgedrop Smoothing), to ensure certifiable robustness for any GCL model, and this certified robustness can be provably preserved in downstream tasks. Furthermore, an effective training method is proposed for robust GCL. Extensive experiments on real-world datasets demonstrate the effectiveness of our proposed method in providing effective certifiable robustness and enhancing the robustness of any GCL model. The source code of RES is available at https://github.com/ventr1c/RES-GCL.
    SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks. (arXiv:2310.03684v1 [cs.LG])
    Despite efforts to align large language models (LLMs) with human values, widely-used LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks, wherein an adversary fools a targeted LLM into generating objectionable content. To address this vulnerability, we propose SmoothLLM, the first algorithm designed to mitigate jailbreaking attacks on LLMs. Based on our finding that adversarially-generated prompts are brittle to character-level changes, our defense first randomly perturbs multiple copies of a given input prompt, and then aggregates the corresponding predictions to detect adversarial inputs. SmoothLLM reduces the attack success rate on numerous popular LLMs to below one percentage point, avoids unnecessary conservatism, and admits provable guarantees on attack mitigation. Moreover, our defense uses exponentially fewer queries than existing attacks and is compatible with any LLM.
    BTDNet: a Multi-Modal Approach for Brain Tumor Radiogenomic Classification. (arXiv:2310.03485v1 [eess.IV])
    Brain tumors pose significant health challenges worldwide, with glioblastoma being one of the most aggressive forms. Accurate determination of the O6-methylguanine-DNA methyltransferase (MGMT) promoter methylation status is crucial for personalized treatment strategies. However, traditional methods are labor-intensive and time-consuming. This paper proposes a novel multi-modal approach, BTDNet, leveraging multi-parametric MRI scans, including FLAIR, T1w, T1wCE, and T2 3D volumes, to predict MGMT promoter methylation status. BTDNet addresses two main challenges: the variable volume lengths (i.e., each volume consists of a different number of slices) and the volume-level annotations (i.e., the whole 3D volume is annotated and not the independent slices that it consists of). BTDNet consists of four components: i) the data augmentation one (that performs geometric transformations, convex combinations of data pairs and test-time data augmentation); ii) the 3D analysis one (that performs global analysis through a CNN-RNN); iii) the routing one (that contains a mask layer that handles variable input feature lengths), and iv) the modality fusion one (that effectively enhances data representation, reduces ambiguities and mitigates data scarcity). The proposed method outperforms by large margins the state-of-the-art methods in the RSNA-ASNR-MICCAI BraTS 2021 Challenge, offering a promising avenue for enhancing brain tumor diagnosis and treatment.
    Otago Exercises Monitoring for Older Adults by a Single IMU and Hierarchical Machine Learning Models. (arXiv:2310.03512v1 [cs.LG])
    Otago Exercise Program (OEP) is a rehabilitation program for older adults to improve frailty, sarcopenia, and balance. Accurate monitoring of patient involvement in OEP is challenging, as self-reports (diaries) are often unreliable. With the development of wearable sensors, Human Activity Recognition (HAR) systems using wearable sensors have revolutionized healthcare. However, their usage for OEP still shows limited performance. The objective of this study is to build an unobtrusive and accurate system to monitor OEP for older adults. Data was collected from older adults wearing a single waist-mounted Inertial Measurement Unit (IMU). Two datasets were collected, one in a laboratory setting, and one at the homes of the patients. A hierarchical system is proposed with two stages: 1) using a deep learning model to recognize whether the patients are performing OEP or activities of daily life (ADLs) using a 10-minute sliding window; 2) based on stage 1, using a 6-second sliding window to recognize the OEP sub-classes performed. The results showed that in stage 1, OEP could be recognized with window-wise f1-scores over 0.95 and Intersection-over-Union (IoU) f1-scores over 0.85 for both datasets. In stage 2, for the home scenario, four activities could be recognized with f1-scores over 0.8: ankle plantarflexors, abdominal muscles, knee bends, and sit-to-stand. The results showed the potential of monitoring the compliance of OEP using a single IMU in daily life. Also, some OEP sub-classes are possible to be recognized for further analysis.
    Variational Inference for GARCH-family Models. (arXiv:2310.03435v1 [stat.ML])
    The Bayesian estimation of GARCH-family models has been typically addressed through Monte Carlo sampling. Variational Inference is gaining popularity and attention as a robust approach for Bayesian inference in complex machine learning models; however, its adoption in econometrics and finance is limited. This paper discusses the extent to which Variational Inference constitutes a reliable and feasible alternative to Monte Carlo sampling for Bayesian inference in GARCH-like models. Through a large-scale experiment involving the constituents of the S&P 500 index, several Variational Inference optimizers, a variety of volatility models, and a case study, we show that Variational Inference is an attractive, remarkably well-calibrated, and competitive method for Bayesian learning.
    Explaining Emergent In-Context Learning as Kernel Regression. (arXiv:2305.12766v2 [cs.CL] UPDATED)
    Large language models (LLMs) have initiated a paradigm shift in transfer learning. In contrast to the classic pretraining-then-finetuning procedure, in order to use LLMs for downstream prediction tasks, one only needs to provide a few demonstrations, known as in-context examples, without adding more or updating existing model parameters. This in-context learning (ICL) capability of LLMs is intriguing, and it is not yet fully understood how pretrained LLMs acquire such capabilities. In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training on a general language corpus by proposing one hypothesis that LLMs can simulate kernel regression with internal representations when faced with in-context examples. More concretely, we first prove that Bayesian inference on in-context prompts can be asymptotically understood as kernel regression $\hat y = \sum_i y_i K(x, x_i)/\sum_i K(x, x_i)$ as the number of in-context demonstrations grows. Then, we empirically investigate the in-context behaviors of language models. We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression. Finally, our theory provides insights into multiple phenomena observed in the ICL field: why retrieving demonstrative samples similar to test samples can help, why ICL performance is sensitive to the output formats, and why ICL accuracy benefits from selecting in-distribution and representative samples.
    Deep Variational Multivariate Information Bottleneck -- A Framework for Variational Losses. (arXiv:2310.03311v1 [cs.LG])
    Variational dimensionality reduction methods are known for their high accuracy, generative abilities, and robustness. These methods have many theoretical justifications. Here we introduce a unifying principle rooted in information theory to rederive and generalize existing variational methods and design new ones. We base our framework on an interpretation of the multivariate information bottleneck, in which two Bayesian networks are traded off against one another. We interpret the first network as an encoder graph, which specifies what information to keep when compressing the data. We interpret the second network as a decoder graph, which specifies a generative model for the data. Using this framework, we rederive existing dimensionality reduction methods such as the deep variational information bottleneck (DVIB), beta variational auto-encoders (beta-VAE), and deep variational canonical correlation analysis (DVCCA). The framework naturally introduces a trade-off parameter between compression and reconstruction in the DVCCA family of algorithms, resulting in the new beta-DVCCA family. In addition, we derive a new variational dimensionality reduction method, deep variational symmetric informational bottleneck (DVSIB), which simultaneously compresses two variables to preserve information between their compressed representations. We implement all of these algorithms and evaluate their ability to produce shared low dimensional latent spaces on a modified noisy MNIST dataset. We show that algorithms that are better matched to the structure of the data (beta-DVCCA and DVSIB) produce better latent spaces as measured by classification accuracy and the dimensionality of the latent variables. We believe that this framework can be used to unify other multi-view representation learning algorithms. Additionally, it provides a straightforward framework for deriving problem-specific loss functions.
    Memory Capacity of Recurrent Neural Networks with Matrix Representation. (arXiv:2104.07454v3 [cs.LG] UPDATED)
    It is well known that canonical recurrent neural networks (RNNs) face limitations in learning long-term dependencies which have been addressed by memory structures in long short-term memory (LSTM) networks. Neural Turing machines (NTMs) are novel RNNs that implement the notion of programmable computers with neural network controllers that can learn simple algorithmic tasks. Matrix neural networks feature matrix representation which inherently preserves the spatial structure of data when compared to canonical neural networks that use vector-based representation. One may then argue that neural networks with matrix representations may have the potential to provide better memory capacity. In this paper, we define and study a probabilistic notion of memory capacity based on Fisher information for matrix-based RNNs. We find bounds on memory capacity for such networks under various hypotheses and compare them with their vector counterparts. In particular, we show that the memory capacity of such networks is bounded by $N^2$ for $N\times N$ state matrix which generalizes the one known for vector networks. We also show and analyze the increase in memory capacity for such networks which is introduced when one exhibits an external state memory, such as NTMs. Consequently, we construct NTMs with RNN controllers with matrix-based representation of external memory, leading us to introduce Matrix NTMs. We demonstrate the performance of this class of memory networks under certain algorithmic learning tasks such as copying and recall and compare it with Matrix RNNs. We find an improvement in the performance of Matrix NTMs by the addition of external memory, in comparison to Matrix RNNs.
    Distributional PAC-Learning from Nisan's Natural Proofs. (arXiv:2310.03641v1 [cs.CC])
    (Abridged) Carmosino et al. (2016) demonstrated that natural proofs of circuit lower bounds for \Lambda imply efficient algorithms for learning \Lambda-circuits, but only over the uniform distribution, with membership queries, and provided \AC^0[p] \subseteq \Lambda. We consider whether this implication can be generalized to \Lambda \not\supseteq \AC^0[p], and to learning algorithms in Valiant's PAC model, which use only random examples and learn over arbitrary example distributions. We give results of both positive and negative flavor. On the negative side, we observe that if, for every circuit class \Lambda, the implication from natural proofs for \Lambda to learning \Lambda-circuits in Valiant's PAC model holds, then there is a polynomial time solution to O(n^{1.5})-uSVP (unique Shortest Vector Problem), and polynomial time quantum solutions to O(n^{1.5})-SVP (Shortest Vector Problem) and O(n^{1.5})-SIVP (Shortest Independent Vector Problem). This indicates that whether natural proofs for \Lambda imply efficient learning algorithms for \Lambda in Valiant's PAC model may depend on \Lambda. On the positive side, our main result is that specific natural proofs arising from a type of communication complexity argument (e.g., Nisan (1993), for depth-2 majority circuits) imply PAC-learning algorithms in a new distributional variant of Valiant's model. Our distributional PAC model is stronger than the average-case prediction model of Blum et al (1993) and the heuristic PAC model of Nanashima (2021), and has several important properties which make it of independent interest, such as being boosting-friendly. The main applications of our result are new distributional PAC-learning algorithms for depth-2 majority circuits, polytopes and DNFs over natural target distributions, as well as the nonexistence of encoded-input weak PRFs that can be evaluated by depth-2 majority circuits.
    Over-the-Air Federated Learning with Compressed Sensing: Is Sparsification Necessary?. (arXiv:2310.03410v1 [cs.IT])
    Over-the-Air (OtA) Federated Learning (FL) refers to an FL system where multiple agents apply OtA computation for transmitting model updates to a common edge server. Two important features of OtA computation, namely linear processing and signal-level superposition, motivate the use of linear compression with compressed sensing (CS) methods to reduce the number of data samples transmitted over the channel. The previous works on applying CS methods in OtA FL have primarily assumed that the original model update vectors are sparse, or they have been sparsified before compression. However, it is unclear whether linear compression with CS-based reconstruction is more effective than directly sending the non-zero elements in the sparsified update vectors, under the same total power constraint. In this study, we examine and compare several communication designs with or without sparsification. Our findings demonstrate that sparsification before compression is not necessary. Alternatively, sparsification without linear compression can also achieve better performance than the commonly considered setup that combines both.
    FASER: Binary Code Similarity Search through the use of Intermediate Representations. (arXiv:2310.03605v1 [cs.CR])
    Being able to identify functions of interest in cross-architecture software is useful whether you are analysing for malware, securing the software supply chain or conducting vulnerability research. Cross-Architecture Binary Code Similarity Search has been explored in numerous studies and has used a wide range of different data sources to achieve its goals. The data sources typically used draw on common structures derived from binaries such as function control flow graphs or binary level call graphs, the output of the disassembly process or the outputs of a dynamic analysis approach. One data source which has received less attention is binary intermediate representations. Binary Intermediate representations possess two interesting properties: they are cross architecture by their very nature and encode the semantics of a function explicitly to support downstream usage. Within this paper we propose Function as a String Encoded Representation (FASER) which combines long document transformers with the use of intermediate representations to create a model capable of cross architecture function search without the need for manual feature engineering, pre-training or a dynamic analysis step. We compare our approach against a series of baseline approaches for two tasks; A general function search task and a targeted vulnerability search task. Our approach demonstrates strong performance across both tasks, performing better than all baseline approaches.
    LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers. (arXiv:2310.03294v1 [cs.LG])
    Increasing the context length of large language models (LLMs) unlocks fundamentally new capabilities, but also significantly increases the memory footprints of training. Previous model-parallel systems such as Megatron-LM partition and compute different attention heads in parallel, resulting in large communication volumes, so they cannot scale beyond the number of attention heads, thereby hindering its adoption. In this paper, we introduce a new approach, LightSeq, for long-context LLMs training. LightSeq has many notable advantages. First, LightSeq partitions over the sequence dimension, hence is agnostic to model architectures and readily applicable for models with varying numbers of attention heads, such as Multi-Head, Multi-Query and Grouped-Query attention. Second, LightSeq not only requires up to 4.7x less communication than Megatron-LM on popular LLMs but also overlaps the communication with computation. To further reduce the training time, LightSeq features a novel gradient checkpointing scheme to bypass an forward computation for memory-efficient attention. We evaluate LightSeq on Llama-7B and its variants with sequence lengths from 32K to 512K. Through comprehensive experiments on single and cross-node training, we show that LightSeq achieves up to 1.24-2.01x end-to-end speedup, and a 2-8x longer sequence length on models with fewer heads, compared to Megatron-LM. Codes will be available at https://github.com/RulinShao/LightSeq.
    Neural architecture impact on identifying temporally extended Reinforcement Learning tasks. (arXiv:2310.03161v1 [cs.LG])
    Inspired by recent developments in attention models for image classification and natural language processing, we present various Attention based architectures in reinforcement learning (RL) domain, capable of performing well on OpenAI Gym Atari-2600 game suite. In spite of the recent success of Deep Reinforcement learning techniques in various fields like robotics, gaming and healthcare, they suffer from a major drawback that neural networks are difficult to interpret. We try to get around this problem with the help of Attention based models. In Attention based models, extracting and overlaying of attention map onto images allows for direct observation of information used by agent to select actions and easier interpretation of logic behind the chosen actions. Our models in addition to playing well on gym-Atari environments, also provide insights on how agent perceives its environment. In addition, motivated by recent developments in attention based video-classification models using Vision Transformer, we come up with an architecture based on Vision Transformer, for image-based RL domain too. Compared to previous works in Vision Transformer, our model is faster to train and requires fewer computational resources. 3
    SAF: Smart Aggregation Framework for Revealing Atoms Importance Rank and Improving Prediction Rates in Drug Discovery. (arXiv:2310.03028v1 [physics.chem-ph])
    Machine learning, and representation learning in particular, has the potential to facilitate drug discovery by screening a large chemical space in silico. A successful approach for representing molecules is to treat them as a graph and utilize graph neural networks. One of the key limitations of such methods is the necessity to represent compounds with different numbers of atoms, which requires aggregating the atom's information. Common aggregation operators, such as averaging, result in loss of information at the atom level. In this work, we propose a novel aggregating approach where each atom is weighted non-linearly using the Boltzmann distribution with a hyperparameter analogous to temperature. We show that using this weighted aggregation improves the ability of the gold standard message-passing neural network to predict antibiotic activity. Moreover, by changing the temperature hyperparameter, our approach can reveal the atoms that are important for activity prediction in a smooth and consistent way, thus providing a novel, regulated attention mechanism for graph neural networks. We further validate our method by showing that it recapitulates the functional group in beta-Lactam antibiotics. The ability of our approach to rank the atoms' importance for a desired function can be used within any graph neural network to provide interpretability of the results and predictions at the node level.
    Non-Smooth Weakly-Convex Finite-sum Coupled Compositional Optimization. (arXiv:2310.03234v1 [math.OC])
    This paper investigates new families of compositional optimization problems, called $\underline{\bf n}$on-$\underline{\bf s}$mooth $\underline{\bf w}$eakly-$\underline{\bf c}$onvex $\underline{\bf f}$inite-sum $\underline{\bf c}$oupled $\underline{\bf c}$ompositional $\underline{\bf o}$ptimization (NSWC FCCO). There has been a growing interest in FCCO due to its wide-ranging applications in machine learning and AI, as well as its ability to address the shortcomings of stochastic algorithms based on empirical risk minimization. However, current research on FCCO presumes that both the inner and outer functions are smooth, limiting their potential to tackle a more diverse set of problems. Our research expands on this area by examining non-smooth weakly-convex FCCO, where the outer function is weakly convex and non-decreasing, and the inner function is weakly-convex. We analyze a single-loop algorithm and establish its complexity for finding an $\epsilon$-stationary point of the Moreau envelop of the objective function. Additionally, we also extend the algorithm to solving novel non-smooth weakly-convex tri-level finite-sum coupled compositional optimization problems, which feature a nested arrangement of three functions. Lastly, we explore the applications of our algorithms in deep learning for two-way partial AUC maximization and multi-instance two-way partial AUC maximization, using empirical studies to showcase the effectiveness of the proposed algorithms.
    Graph-enhanced Optimizers for Structure-aware Recommendation Embedding Evolution. (arXiv:2310.03032v1 [cs.IR])
    Embedding plays a critical role in modern recommender systems because they are virtual representations of real-world entities and the foundation for subsequent decision models. In this paper, we propose a novel embedding update mechanism, Structure-aware Embedding Evolution (SEvo for short), to encourage related nodes to evolve similarly at each step. Unlike GNN (Graph Neural Network) that typically serves as an intermediate part, SEvo is able to directly inject the graph structure information into embedding with negligible computational overhead in training. The convergence properties of SEvo as well as its possible variants are theoretically analyzed to justify the validity of the designs. Moreover, SEvo can be seamlessly integrated into existing optimizers for state-of-the-art performance. In particular, SEvo-enhanced AdamW with moment estimate correction demonstrates consistent improvements across a spectrum of models and datasets, suggesting a novel technical route to effectively utilize graph structure information beyond explicit GNN modules.
    Efficient Federated Prompt Tuning for Black-box Large Pre-trained Models. (arXiv:2310.03123v1 [cs.LG])
    With the blowout development of pre-trained models (PTMs), the efficient tuning of these models for diverse downstream applications has emerged as a pivotal research concern. Although recent investigations into prompt tuning have provided promising avenues, three salient challenges persist: (1) memory constraint: the continuous growth in the size of open-source PTMs renders fine-tuning, even a fraction of their parameters, challenging for many practitioners. (2) model privacy: existing PTMs often function as public API services, with their parameters inaccessible for effective or tailored fine-tuning. (3) data privacy: the fine-tuning of PTMs necessitates high-quality datasets, which are typically localized and not shared to public. To optimally harness each local dataset while navigating memory constraints and preserving privacy, we propose Federated Black-Box Prompt Tuning (Fed-BBPT). This innovative approach eschews reliance on parameter architectures and private dataset access, instead capitalizing on a central server that aids local users in collaboratively training a prompt generator through regular aggregation. Local users leverage API-driven learning via a zero-order optimizer, obviating the need for PTM deployment. Relative to extensive fine-tuning, Fed-BBPT proficiently sidesteps memory challenges tied to PTM storage and fine-tuning on local machines, tapping into comprehensive, high-quality, yet private training datasets. A thorough evaluation across 40 datasets spanning CV and NLP tasks underscores the robustness of our proposed model.
    Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly. (arXiv:2310.03150v1 [cs.LG])
    Large Language Models (LLM) and foundation models are popular as they offer new opportunities for individuals and businesses to improve natural language processing, interact with data, and retrieve information faster. However, training or fine-tuning LLMs requires a vast amount of data, which can be challenging to access due to legal or technical restrictions and may require private computing resources. Federated Learning (FL) is a solution designed to overcome these challenges and expand data access for deep learning applications. This paper takes a hardware-centric approach to explore how LLMs can be brought to modern edge computing systems. Our study fine-tunes the FLAN-T5 model family, ranging from 80M to 3B parameters, using FL for a text summarization task. We provide a micro-level hardware benchmark, compare the model FLOP utilization to a state-of-the-art data center GPU, and study the network utilization in realistic conditions. Our contribution is twofold: First, we evaluate the current capabilities of edge computing systems and their potential for LLM FL workloads. Second, by comparing these systems with a data-center GPU, we demonstrate the potential for improvement and the next steps toward achieving greater computational efficiency at the edge.
    Digital Ethics in Federated Learning. (arXiv:2310.03178v1 [cs.LG])
    The Internet of Things (IoT) consistently generates vast amounts of data, sparking increasing concern over the protection of data privacy and the limitation of data misuse. Federated learning (FL) facilitates collaborative capabilities among multiple parties by sharing machine learning (ML) model parameters instead of raw user data, and it has recently gained significant attention for its potential in privacy preservation and learning efficiency enhancement. In this paper, we highlight the digital ethics concerns that arise when human-centric devices serve as clients in FL. More specifically, challenges of game dynamics, fairness, incentive, and continuity arise in FL due to differences in perspectives and objectives between clients and the server. We analyze these challenges and their solutions from the perspectives of both the client and the server, and through the viewpoints of centralized and decentralized FL. Finally, we explore the opportunities in FL for human-centric IoT as directions for future development.
    UniPredict: Large Language Models are Universal Tabular Predictors. (arXiv:2310.03266v1 [cs.LG])
    Tabular data prediction is a fundamental machine learning task for many applications. Existing methods predominantly employ discriminative modeling and operate under the assumption of a fixed target column, necessitating re-training for every new predictive task. Inspired by the generative power of large language models (LLMs), this paper exploits the idea of building universal tabular data predictors based on generative modeling, namely UniPredict. Here, we show that scaling up an LLM to extensive tabular datasets with the capability of comprehending diverse tabular inputs and predicting for target variables following the input instructions. Specifically, we train a single LLM on an aggregation of 169 tabular datasets with diverse targets and compare its performance against baselines that are trained on each dataset separately. We observe this versatile UniPredict model demonstrates an advantage over other models, ranging from 5.4% to 13.4%, when compared with the best tree-boosting baseline and the best neural network baseline, respectively. We further test UniPredict in few-shot learning settings on another 62 tabular datasets. Our method achieves strong performance in quickly adapting to new tasks, where our method outperforms XGBoost over 100% on the low-resource setup and shows a significant margin over all baselines. We envision that UniPredict sheds light on developing a universal tabular data prediction system that learns from data at scale and serves a wide range of prediction tasks.
    OpenMM 8: Molecular Dynamics Simulation with Machine Learning Potentials. (arXiv:2310.03121v1 [physics.chem-ph])
    Machine learning plays an important and growing role in molecular simulation. The newest version of the OpenMM molecular dynamics toolkit introduces new features to support the use of machine learning potentials. Arbitrary PyTorch models can be added to a simulation and used to compute forces and energy. A higher-level interface allows users to easily model their molecules of interest with general purpose, pretrained potential functions. A collection of optimized CUDA kernels and custom PyTorch operations greatly improves the speed of simulations. We demonstrate these features on simulations of cyclin-dependent kinase 8 (CDK8) and the green fluorescent protein (GFP) chromophore in water. Taken together, these features make it practical to use machine learning to improve the accuracy of simulations at only a modest increase in cost.
    Multi-Task Learning For Reduced Popularity Bias In Multi-Territory Video Recommendations. (arXiv:2310.03148v1 [cs.IR])
    Various data imbalances that naturally arise in a multi-territory personalized recommender system can lead to a significant item bias for globally prevalent items. A locally popular item can be overshadowed by a globally prevalent item. Moreover, users' viewership patterns/statistics can drastically change from one geographic location to another which may suggest to learn specific user embeddings. In this paper, we propose a multi-task learning (MTL) technique, along with an adaptive upsampling method to reduce popularity bias in multi-territory recommendations. Our proposed framework is designed to enrich training examples with active users representation through upsampling, and capable of learning geographic-based user embeddings by leveraging MTL. Through experiments, we demonstrate the effectiveness of our framework in multiple territories compared to a baseline not incorporating our proposed techniques.~Noticeably, we show improved relative gain of up to $65.27\%$ in PR-AUC metric. A case study is presented to demonstrate the advantages of our methods in attenuating the popularity bias of global items.
    Deep Learning in Computational Biology: Advancements, Challenges, and Future Outlook. (arXiv:2310.03086v1 [cs.LG])
    Deep learning has become a powerful tool in computational biology, revolutionising the analysis and interpretation of biological data over time. In our article review, we delve into various aspects of deep learning in computational biology. Specifically, we examine its history, advantages, and challenges. Our focus is on two primary applications: DNA sequence classification and prediction, as well as protein structure prediction from sequence data. Additionally, we provide insights into the outlook for this field. To fully harness the potential of deep learning in computational biology, it is crucial to address the challenges that come with it. These challenges include the requirement for large, labelled datasets and the interpretability of deep learning models. The use of deep learning in the analysis of DNA sequences has brought about a significant transformation in the detection of genomic variants and the analysis of gene expression. This has greatly contributed to the advancement of personalised medicine and drug discovery. Convolutional neural networks (CNNs) have been shown to be highly accurate in predicting genetic variations and gene expression levels. Deep learning techniques are used for analysing epigenetic data, including DNA methylation and histone modifications. This provides valuable insights into metabolic conditions and gene regulation. The field of protein structure prediction has been significantly impacted by deep learning, which has enabled accurate determination of the three-dimensional shape of proteins and prediction of their interactions. The future of deep learning in computational biology looks promising. With the development of advanced deep learning models and interpretation techniques, there is potential to overcome current challenges and further our understanding of biological systems.
    Deep reinforcement learning for machine scheduling: Methodology, the state-of-the-art, and future directions. (arXiv:2310.03195v1 [cs.LG])
    Machine scheduling aims to optimize job assignments to machines while adhering to manufacturing rules and job specifications. This optimization leads to reduced operational costs, improved customer demand fulfillment, and enhanced production efficiency. However, machine scheduling remains a challenging combinatorial problem due to its NP-hard nature. Deep Reinforcement Learning (DRL), a key component of artificial general intelligence, has shown promise in various domains like gaming and robotics. Researchers have explored applying DRL to machine scheduling problems since 1995. This paper offers a comprehensive review and comparison of DRL-based approaches, highlighting their methodology, applications, advantages, and limitations. It categorizes these approaches based on computational components: conventional neural networks, encoder-decoder architectures, graph neural networks, and metaheuristic algorithms. Our review concludes that DRL-based methods outperform exact solvers, heuristics, and tabular reinforcement learning algorithms in terms of computation speed and generating near-global optimal solutions. These DRL-based approaches have been successfully applied to static and dynamic scheduling across diverse machine environments and job characteristics. However, DRL-based schedulers face limitations in handling complex operational constraints, configurable multi-objective optimization, generalization, scalability, interpretability, and robustness. Addressing these challenges will be a crucial focus for future research in this field. This paper serves as a valuable resource for researchers to assess the current state of DRL-based machine scheduling and identify research gaps. It also aids experts and practitioners in selecting the appropriate DRL approach for production scheduling.
    Learning Energy-Based Prior Model with Diffusion-Amortized MCMC. (arXiv:2310.03218v1 [cs.LG])
    Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in the field of generative modeling due to its flexibility in the formulation and strong modeling power of the latent space. However, the common practice of learning latent space EBMs with non-convergent short-run MCMC for prior and posterior sampling is hindering the model from further progress; the degenerate MCMC sampling quality in practice often leads to degraded generation quality and instability in training, especially with highly multi-modal and/or high-dimensional target distributions. To remedy this sampling issue, in this paper we introduce a simple but effective diffusion-based amortization method for long-run MCMC sampling and develop a novel learning algorithm for the latent space EBM based on it. We provide theoretical evidence that the learned amortization of MCMC is a valid long-run MCMC sampler. Experiments on several image modeling benchmark datasets demonstrate the superior performance of our method compared with strong counterparts
    Attributing Learned Concepts in Neural Networks to Training Data. (arXiv:2310.03149v1 [cs.LG])
    By now there is substantial evidence that deep learning models learn certain human-interpretable features as part of their internal representations of data. As having the right (or wrong) concepts is critical to trustworthy machine learning systems, it is natural to ask which inputs from the model's original training set were most important for learning a concept at a given layer. To answer this, we combine data attribution methods with methods for probing the concepts learned by a model. Training network and probe ensembles for two concept datasets on a range of network layers, we use the recently developed TRAK method for large-scale data attribution. We find some evidence for convergence, where removing the 10,000 top attributing images for a concept and retraining the model does not change the location of the concept in the network nor the probing sparsity of the concept. This suggests that rather than being highly dependent on a few specific examples, the features that inform the development of a concept are spread in a more diffuse manner across its exemplars, implying robustness in concept formation.
    Fairness-enhancing mixed effects deep learning improves fairness on in- and out-of-distribution clustered (non-iid) data. (arXiv:2310.03146v1 [cs.LG])
    Traditional deep learning (DL) suffers from two core problems. Firstly, it assumes training samples are independent and identically distributed. However, numerous real-world datasets group samples by shared measurements (e.g., study participants or cells), violating this assumption. In these scenarios, DL can show compromised performance, limited generalization, and interpretability issues, coupled with cluster confounding causing Type 1 and 2 errors. Secondly, models are typically trained for overall accuracy, often neglecting underrepresented groups and introducing biases in crucial areas like loan approvals or determining health insurance rates, such biases can significantly impact one's quality of life. To address both of these challenges simultaneously, we present a mixed effects deep learning (MEDL) framework. MEDL separately quantifies cluster-invariant fixed effects (FE) and cluster-specific random effects (RE) through the introduction of: 1) a cluster adversary which encourages the learning of cluster-invariant FE, 2) a Bayesian neural network which quantifies the RE, and a mixing function combining the FE an RE into a mixed-effect prediction. We marry this MEDL with adversarial debiasing, which promotes equality-of-odds fairness across FE, RE, and ME predictions for fairness-sensitive variables. We evaluated our approach using three datasets: two from census/finance focusing on income classification and one from healthcare predicting hospitalization duration, a regression task. Our framework notably enhances fairness across all sensitive variables-increasing fairness up to 82% for age, 43% for race, 86% for sex, and 27% for marital-status. Besides promoting fairness, our method maintains the robust performance and clarity of MEDL. It's versatile, suitable for various dataset types and tasks, making it broadly applicable. Our GitHub repository houses the implementation.
    Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models. (arXiv:2310.03182v1 [cs.CV])
    Medical image classification is a critical problem for healthcare, with the potential to alleviate the workload of doctors and facilitate diagnoses of patients. However, two challenges arise when deploying deep learning models to real-world healthcare applications. First, neural models tend to learn spurious correlations instead of desired features, which could fall short when generalizing to new domains (e.g., patients with different ages). Second, these black-box models lack interpretability. When making diagnostic predictions, it is important to understand why a model makes a decision for trustworthy and safety considerations. In this paper, to address these two limitations, we propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts. Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model. We systematically evaluate our method on eight medical image classification datasets to verify its effectiveness. On challenging datasets with strong confounding factors, our method can mitigate spurious correlations thus substantially outperform standard visual encoders and other baselines. Finally, we show how classification with a small number of concepts brings a level of interpretability for understanding model decisions through case studies in real medical data.
    Context-Based Tweet Engagement Prediction. (arXiv:2310.03147v1 [cs.IR])
    Twitter is currently one of the biggest social media platforms. Its users may share, read, and engage with short posts called tweets. For the ACM Recommender Systems Conference 2020, Twitter published a dataset around 70 GB in size for the annual RecSys Challenge. In 2020, the RecSys Challenge invited participating teams to create models that would predict engagement likelihoods for given user-tweet combinations. The submitted models predicting like, reply, retweet, and quote engagements were evaluated based on two metrics: area under the precision-recall curve (PRAUC) and relative cross-entropy (RCE). In this diploma thesis, we used the RecSys 2020 Challenge dataset and evaluation procedure to investigate how well context alone may be used to predict tweet engagement likelihood. In doing so, we employed the Spark engine on TU Wien's Little Big Data Cluster to create scalable data preprocessing, feature engineering, feature selection, and machine learning pipelines. We manually created just under 200 additional features to describe tweet context. The results indicate that features describing users' prior engagement history and the popularity of hashtags and links in the tweet were the most informative. We also found that factors such as the prediction algorithm, training dataset size, training dataset sampling method, and feature selection significantly affect the results. After comparing the best results of our context-only prediction models with content-only models and with models developed by the Challenge winners, we identified that the context-based models underperformed in terms of the RCE score. This work thus concludes by situating this discrepancy and proposing potential improvements to our implementation, which is shared in a public git repository.
    Untargeted White-box Adversarial Attack with Heuristic Defence Methods in Real-time Deep Learning based Network Intrusion Detection System. (arXiv:2310.03334v1 [cs.LG])
    Network Intrusion Detection System (NIDS) is a key component in securing the computer network from various cyber security threats and network attacks. However, consider an unfortunate situation where the NIDS is itself attacked and vulnerable more specifically, we can say, How to defend the defender?. In Adversarial Machine Learning (AML), the malicious actors aim to fool the Machine Learning (ML) and Deep Learning (DL) models to produce incorrect predictions with intentionally crafted adversarial examples. These adversarial perturbed examples have become the biggest vulnerability of ML and DL based systems and are major obstacles to their adoption in real-time and mission-critical applications such as NIDS. AML is an emerging research domain, and it has become a necessity for the in-depth study of adversarial attacks and their defence strategies to safeguard the computer network from various cyber security threads. In this research work, we aim to cover important aspects related to NIDS, adversarial attacks and its defence mechanism to increase the robustness of the ML and DL based NIDS. We implemented four powerful adversarial attack techniques, namely, Fast Gradient Sign Method (FGSM), Jacobian Saliency Map Attack (JSMA), Projected Gradient Descent (PGD) and Carlini & Wagner (C&W) in NIDS. We analyzed its performance in terms of various performance metrics in detail. Furthermore, the three heuristics defence strategies, i.e., Adversarial Training (AT), Gaussian Data Augmentation (GDA) and High Confidence (HC), are implemented to improve the NIDS robustness under adversarial attack situations. The complete workflow is demonstrated in real-time network with data packet flow. This research work provides the overall background for the researchers interested in AML and its implementation from a computer network security point of view.
    Dual Prompt Tuning for Domain-Aware Federated Learning. (arXiv:2310.03103v1 [cs.LG])
    Federated learning is a distributed machine learning paradigm that allows multiple clients to collaboratively train a shared model with their local data. Nonetheless, conventional federated learning algorithms often struggle to generalize well due to the ubiquitous domain shift across clients. In this work, we consider a challenging yet realistic federated learning scenario where the training data of each client originates from different domains. We address the challenges of domain shift by leveraging the technique of prompt learning, and propose a novel method called Federated Dual Prompt Tuning (Fed-DPT). Specifically, Fed-DPT employs a pre-trained vision-language model and then applies both visual and textual prompt tuning to facilitate domain adaptation over decentralized data. Extensive experiments of Fed-DPT demonstrate its significant effectiveness in domain-aware federated learning. With a pre-trained CLIP model (ViT-Base as image encoder), the proposed Fed-DPT attains 68.4% average accuracy over six domains in the DomainNet dataset, which improves the original CLIP by a large margin of 14.8%.
  • Open

    Banach Space Optimality of Neural Architectures With Multivariate Nonlinearities. (arXiv:2310.03696v1 [stat.ML])
    We investigate the variational optimality (specifically, the Banach space optimality) of a large class of neural architectures with multivariate nonlinearities/activation functions. To that end, we construct a new family of Banach spaces defined via a regularization operator and the $k$-plane transform. We prove a representer theorem that states that the solution sets to learning problems posed over these Banach spaces are completely characterized by neural architectures with multivariate nonlinearities. These optimal architectures have skip connections and are tightly connected to orthogonal weight normalization and multi-index models, both of which have received considerable interest in the neural network community. Our framework is compatible with a number of classical nonlinearities including the rectified linear unit (ReLU) activation function, the norm activation function, and the radial basis functions found in the theory of thin-plate/polyharmonic splines. We also show that the underlying spaces are special instances of reproducing kernel Banach spaces and variation spaces. Our results shed light on the regularity of functions learned by neural networks trained on data, particularly with multivariate nonlinearities, and provide new theoretical motivation for several architectural choices found in practice.
    Stable Training of Probabilistic Models Using the Leave-One-Out Maximum Log-Likelihood Objective. (arXiv:2310.03556v1 [stat.ML])
    Probabilistic modelling of power systems operation and planning processes depends on data-driven methods, which require sufficiently large datasets. When historical data lacks this, it is desired to model the underlying data generation mechanism as a probability distribution to assess the data quality and generate more data, if needed. Kernel density estimation (KDE) based models are popular choices for this task, but they fail to adapt to data regions with varying densities. In this paper, an adaptive KDE model is employed to circumvent this, where each kernel in the model has an individual bandwidth. The leave-one-out maximum log-likelihood (LOO-MLL) criterion is proposed to prevent the singular solutions that the regular MLL criterion gives rise to, and it is proven that LOO-MLL prevents these. Relying on this guaranteed robustness, the model is extended by assigning learnable weights to the kernels. In addition, a modified expectation-maximization algorithm is employed to accelerate the optimization speed reliably. The performance of the proposed method and models are exhibited on two power systems datasets using different statistical tests and by comparison with Gaussian mixture models. Results show that the proposed models have promising performance, in addition to their singularity prevention guarantees.
    On Convergence of Federated Averaging Langevin Dynamics. (arXiv:2112.05120v4 [stat.ML] UPDATED)
    We propose a federated averaging Langevin algorithm (FA-LD) for uncertainty quantification and mean predictions with distributed clients. In particular, we generalize beyond normal posterior distributions and consider a general class of models. We develop theoretical guarantees for FA-LD for strongly log-concave distributions with non-i.i.d data and study how the injected noise and the stochastic-gradient noise, the heterogeneity of data, and the varying learning rates affect the convergence. Such an analysis sheds light on the optimal choice of local updates to minimize communication costs. Important to our approach is that the communication efficiency does not deteriorate with the injected noise in the Langevin algorithms. In addition, we examine in our FA-LD algorithm both independent and correlated noise used over different clients. We observe there is a trade-off between the pairs among communication, accuracy, and data privacy. As local devices may become inactive in federated networks, we also show convergence results based on different averaging schemes where only partial device updates are available. In such a case, we discover an additional bias that does not decay to zero.
    Anytime-valid t-tests and confidence sequences for Gaussian means with unknown variance. (arXiv:2310.03722v1 [math.ST])
    In 1976, Lai constructed a nontrivial confidence sequence for the mean $\mu$ of a Gaussian distribution with unknown variance $\sigma$. Curiously, he employed both an improper (right Haar) mixture over $\sigma$ and an improper (flat) mixture over $\mu$. Here, we elaborate carefully on the details of his construction, which use generalized nonintegrable martingales and an extended Ville's inequality. While this does yield a sequential t-test, it does not yield an ``e-process'' (due to the nonintegrability of his martingale). In this paper, we develop two new e-processes and confidence sequences for the same setting: one is a test martingale in a reduced filtration, while the other is an e-process in the canonical data filtration. These are respectively obtained by swapping Lai's flat mixture for a Gaussian mixture, and swapping the right Haar mixture over $\sigma$ with the maximum likelihood estimate under the null, as done in universal inference. We also analyze the width of resulting confidence sequences, which have a curious dependence on the error probability $\alpha$. Numerical experiments are provided along the way to compare and contrast the various approaches.
    A Probabilistic Graph Coupling View of Dimension Reduction. (arXiv:2201.13053v3 [math.PR] UPDATED)
    Most popular dimension reduction (DR) methods like t-SNE and UMAP are based on minimizing a cost between input and latent pairwise similarities. Though widely used, these approaches lack clear probabilistic foundations to enable a full understanding of their properties and limitations. To that extent, we introduce a unifying statistical framework based on the coupling of hidden graphs using cross entropy. These graphs induce a Markov random field dependency structure among the observations in both input and latent spaces. We show that existing pairwise similarity DR methods can be retrieved from our framework with particular choices of priors for the graphs. Moreover this reveals that these methods suffer from a statistical deficiency that explains poor performances in conserving coarse-grain dependencies. Our model is leveraged and extended to address this issue while new links are drawn with Laplacian eigenmaps and PCA.
    CLEVRER-Humans: Describing Physical and Causal Events the Human Way. (arXiv:2310.03635v1 [cs.AI])
    Building machines that can reason about physical events and their causal relationships is crucial for flexible interaction with the physical world. However, most existing physical and causal reasoning benchmarks are exclusively based on synthetically generated events and synthetic natural language descriptions of causal relationships. This design brings up two issues. First, there is a lack of diversity in both event types and natural language descriptions; second, causal relationships based on manually-defined heuristics are different from human judgments. To address both shortcomings, we present the CLEVRER-Humans benchmark, a video reasoning dataset for causal judgment of physical events with human labels. We employ two techniques to improve data collection efficiency: first, a novel iterative event cloze task to elicit a new representation of events in videos, which we term Causal Event Graphs (CEGs); second, a data augmentation technique based on neural language generative models. We convert the collected CEGs into questions and answers to be consistent with prior work. Finally, we study a collection of baseline approaches for CLEVRER-Humans question-answering, highlighting the great challenges set forth by our benchmark.
    High-dimensional Bayesian Optimization with Group Testing. (arXiv:2310.03515v1 [cs.LG])
    Bayesian optimization is an effective method for optimizing expensive-to-evaluate black-box functions. High-dimensional problems are particularly challenging as the surrogate model of the objective suffers from the curse of dimensionality, which makes accurate modeling difficult. We propose a group testing approach to identify active variables to facilitate efficient optimization in these domains. The proposed algorithm, Group Testing Bayesian Optimization (GTBO), first runs a testing phase where groups of variables are systematically selected and tested on whether they influence the objective. To that end, we extend the well-established theory of group testing to functions of continuous ranges. In the second phase, GTBO guides optimization by placing more importance on the active dimensions. By exploiting the axis-aligned subspace assumption, GTBO is competitive against state-of-the-art methods on several synthetic and real-world high-dimensional optimization tasks. Furthermore, GTBO aids in the discovery of active parameters in applications, thereby enhancing practitioners' understanding of the problem at hand.
    Leveraging Model-based Trees as Interpretable Surrogate Models for Model Distillation. (arXiv:2310.03112v1 [stat.ML])
    Surrogate models play a crucial role in retrospectively interpreting complex and powerful black box machine learning models via model distillation. This paper focuses on using model-based trees as surrogate models which partition the feature space into interpretable regions via decision rules. Within each region, interpretable models based on additive main effects are used to approximate the behavior of the black box model, striking for an optimal balance between interpretability and performance. Four model-based tree algorithms, namely SLIM, GUIDE, MOB, and CTree, are compared regarding their ability to generate such surrogate models. We investigate fidelity, interpretability, stability, and the algorithms' capability to capture interaction effects through appropriate splits. Based on our comprehensive analyses, we finally provide an overview of user-specific recommendations.
    Sharpness-Aware Minimization and the Edge of Stability. (arXiv:2309.12488v3 [cs.LG] UPDATED)
    Recent experiments have shown that, often, when training a neural network with gradient descent (GD) with a step size $\eta$, the operator norm of the Hessian of the loss grows until it approximately reaches $2/\eta$, after which it fluctuates around this value. The quantity $2/\eta$ has been called the "edge of stability" based on consideration of a local quadratic approximation of the loss. We perform a similar calculation to arrive at an "edge of stability" for Sharpness-Aware Minimization (SAM), a variant of GD which has been shown to improve its generalization. Unlike the case for GD, the resulting SAM-edge depends on the norm of the gradient. Using three deep learning training tasks, we see empirically that SAM operates on the edge of stability identified by this analysis.
    Quantitative CLTs in Deep Neural Networks. (arXiv:2307.06092v4 [cs.LG] UPDATED)
    We study the distribution of a fully connected neural network with random Gaussian weights and biases in which the hidden layer widths are proportional to a large constant $n$. Under mild assumptions on the non-linearity, we obtain quantitative bounds on normal approximations valid at large but finite $n$ and any fixed network depth. Our theorems show both for the finite-dimensional distributions and the entire process, that the distance between a random fully connected network (and its derivatives) to the corresponding infinite width Gaussian process scales like $n^{-\gamma}$ for $\gamma>0$, with the exponent depending on the metric used to measure discrepancy. Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature; in the one-dimensional case, we also prove that they are optimal, i.e., we establish matching lower bounds.
    Sparse Deep Learning for Time Series Data: Theory and Applications. (arXiv:2310.03243v1 [stat.ML])
    Sparse deep learning has become a popular technique for improving the performance of deep neural networks in areas such as uncertainty quantification, variable selection, and large-scale network compression. However, most existing research has focused on problems where the observations are independent and identically distributed (i.i.d.), and there has been little work on the problems where the observations are dependent, such as time series data and sequential data in natural language processing. This paper aims to address this gap by studying the theory for sparse deep learning with dependent data. We show that sparse recurrent neural networks (RNNs) can be consistently estimated, and their predictions are asymptotically normally distributed under appropriate assumptions, enabling the prediction uncertainty to be correctly quantified. Our numerical results show that sparse deep learning outperforms state-of-the-art methods, such as conformal predictions, in prediction uncertainty quantification for time series data. Furthermore, our results indicate that the proposed method can consistently identify the autoregressive order for time series data and outperform existing methods in large-scale model compression. Our proposed method has important practical implications in fields such as finance, healthcare, and energy, where both accurate point estimates and prediction uncertainty quantification are of concern.
    Interpolating between Clustering and Dimensionality Reduction with Gromov-Wasserstein. (arXiv:2310.03398v1 [cs.LG])
    We present a versatile adaptation of existing dimensionality reduction (DR) objectives, enabling the simultaneous reduction of both sample and feature sizes. Correspondances between input and embedding samples are computed through a semi-relaxed Gromov-Wasserstein optimal transport (OT) problem. When the embedding sample size matches that of the input, our model recovers classical popular DR models. When the embedding's dimensionality is unconstrained, we show that the OT plan delivers a competitive hard clustering. We emphasize the importance of intermediate stages that blend DR and clustering for summarizing real data and apply our method to visualize datasets of images.
    Abstractors and relational cross-attention: An inductive bias for explicit relational reasoning in Transformers. (arXiv:2304.00195v3 [stat.ML] UPDATED)
    An extension of Transformers is proposed that enables explicit relational reasoning through a novel module called the Abstractor. At the core of the Abstractor is a variant of attention called relational cross-attention. The approach is motivated by an architectural inductive bias for relational learning that disentangles relational information from extraneous features about individual objects. This enables explicit relational reasoning, supporting abstraction and generalization from limited data. The Abstractor is first evaluated on simple discriminative relational tasks and compared to existing relational architectures. Next, the Abstractor is evaluated on purely relational sequence-to-sequence tasks, where dramatic improvements are seen in sample efficiency compared to standard Transformers. Finally, Abstractors are evaluated on a collection of tasks based on mathematical problem solving, where modest but consistent improvements in performance and sample efficiency are observed.
    Joint Group Invariant Functions on Data-Parameter Domain Induce Universal Neural Networks. (arXiv:2310.03530v1 [cs.LG])
    The symmetry and geometry of input data are considered to be encoded in the internal data representation inside the neural network, but the specific encoding rule has been less investigated. By focusing on a joint group invariant function on the data-parameter domain, we present a systematic rule to find a dual group action on the parameter domain from a group action on the data domain. Further, we introduce generalized neural networks induced from the joint invariant functions, and present a new group theoretic proof of their universality theorems by using Schur's lemma. Since traditional universality theorems were demonstrated based on functional analytical methods, this study sheds light on the group theoretic aspect of the approximation theory, connecting geometric deep learning to abstract harmonic analysis.
    Non-Asymptotic Analysis of Ensemble Kalman Updates: Effective Dimension and Localization. (arXiv:2208.03246v3 [stat.ML] UPDATED)
    Many modern algorithms for inverse problems and data assimilation rely on ensemble Kalman updates to blend prior predictions with observed data. Ensemble Kalman methods often perform well with a small ensemble size, which is essential in applications where generating each particle is costly. This paper develops a non-asymptotic analysis of ensemble Kalman updates that rigorously explains why a small ensemble size suffices if the prior covariance has moderate effective dimension due to fast spectrum decay or approximate sparsity. We present our theory in a unified framework, comparing several implementations of ensemble Kalman updates that use perturbed observations, square root filtering, and localization. As part of our analysis, we develop new dimension-free covariance estimation bounds for approximately sparse matrices that may be of independent interest.
    SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks. (arXiv:2310.03684v1 [cs.LG])
    Despite efforts to align large language models (LLMs) with human values, widely-used LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks, wherein an adversary fools a targeted LLM into generating objectionable content. To address this vulnerability, we propose SmoothLLM, the first algorithm designed to mitigate jailbreaking attacks on LLMs. Based on our finding that adversarially-generated prompts are brittle to character-level changes, our defense first randomly perturbs multiple copies of a given input prompt, and then aggregates the corresponding predictions to detect adversarial inputs. SmoothLLM reduces the attack success rate on numerous popular LLMs to below one percentage point, avoids unnecessary conservatism, and admits provable guarantees on attack mitigation. Moreover, our defense uses exponentially fewer queries than existing attacks and is compatible with any LLM.
    Unpaired Image-to-Image Translation via Neural Schr\"odinger Bridge. (arXiv:2305.15086v2 [cs.CV] UPDATED)
    Diffusion models are a powerful class of generative models which simulate stochastic differential equations (SDEs) to generate data from noise. Although diffusion models have achieved remarkable progress in recent years, they have limitations in the unpaired image-to-image translation tasks due to the Gaussian prior assumption. Schr\"odinger Bridge (SB), which learns an SDE to translate between two arbitrary distributions, have risen as an attractive solution to this problem. However, none of SB models so far have been successful at unpaired translation between high-resolution images. In this work, we propose the Unpaired Neural Schr\"odinger Bridge (UNSB), which expresses SB problem as a sequence of adversarial learning problems. This allows us to incorporate advanced discriminators and regularization to learn a SB between unpaired data. We demonstrate that UNSB is scalable and successfully solves various unpaired image-to-image translation tasks. Code: \url{https://github.com/cyclomon/UNSB}
    Deep Momentum Multi-Marginal Schr\"odinger Bridge. (arXiv:2303.01751v3 [stat.ML] UPDATED)
    It is a crucial challenge to reconstruct population dynamics using unlabeled samples from distributions at coarse time intervals. Recent approaches such as flow-based models or Schr\"odinger Bridge (SB) models have demonstrated appealing performance, yet the inferred sample trajectories either fail to account for the underlying stochasticity or are $\underline{D}$eep $\underline{M}$omentum Multi-Marginal $\underline{S}$chr\"odinger $\underline{B}$ridge(DMSB), a novel computational framework that learns the smooth measure-valued spline for stochastic systems that satisfy position marginal constraints across time. By tailoring the celebrated Bregman Iteration and extending the Iteration Proportional Fitting to phase space, we manage to handle high-dimensional multi-marginal trajectory inference tasks efficiently. Our algorithm outperforms baselines significantly, as evidenced by experiments for synthetic datasets and a real-world single-cell RNA sequence dataset. Additionally, the proposed approach can reasonably reconstruct the evolution of velocity distribution, from position snapshots only, when there is a ground truth velocity that is nevertheless inaccessible.
    Optimal 1-Wasserstein Distance for WGANs. (arXiv:2201.02824v2 [stat.ML] UPDATED)
    The mathematical forces at work behind Generative Adversarial Networks raise challenging theoretical issues. Motivated by the important question of characterizing the geometrical properties of the generated distributions, we provide a thorough analysis of Wasserstein GANs (WGANs) in both the finite sample and asymptotic regimes. We study the specific case where the latent space is univariate and derive results valid regardless of the dimension of the output space. We show in particular that for a fixed sample size, the optimal WGANs are closely linked with connected paths minimizing the sum of the squared Euclidean distances between the sample points. We also highlight the fact that WGANs are able to approach (for the 1-Wasserstein distance) the target distribution as the sample size tends to infinity, at a given convergence rate and provided the family of generative Lipschitz functions grows appropriately. We derive in passing new results on optimal transport theory in the semi-discrete setting.
    Towards Inferential Reproducibility of Machine Learning Research. (arXiv:2302.04054v6 [cs.LG] UPDATED)
    Reliability of machine learning evaluation -- the consistency of observed evaluation scores across replicated model training runs -- is affected by several sources of nondeterminism which can be regarded as measurement noise. Current tendencies to remove noise in order to enforce reproducibility of research results neglect inherent nondeterminism at the implementation level and disregard crucial interaction effects between algorithmic noise factors and data properties. This limits the scope of conclusions that can be drawn from such experiments. Instead of removing noise, we propose to incorporate several sources of variance, including their interaction with data properties, into an analysis of significance and reliability of machine learning evaluation, with the aim to draw inferences beyond particular instances of trained models. We show how to use linear mixed effects models (LMEMs) to analyze performance evaluation scores, and to conduct statistical inference with a generalized likelihood ratio test (GLRT). This allows us to incorporate arbitrary sources of noise like meta-parameter variations into statistical significance testing, and to assess performance differences conditional on data properties. Furthermore, a variance component analysis (VCA) enables the analysis of the contribution of noise sources to overall variance and the computation of a reliability coefficient by the ratio of substantial to total variance.
    Gradient Flows for Sampling: Mean-Field Models, Gaussian Approximations and Affine Invariance. (arXiv:2302.11024v5 [stat.ML] UPDATED)
    Sampling a probability distribution with an unknown normalization constant is a fundamental problem in computational science and engineering. This task may be cast as an optimization problem over all probability measures, and an initial distribution can be evolved to the desired minimizer dynamically via gradient flows. Mean-field models, whose law is governed by the gradient flow in the space of probability measures, may also be identified; particle approximations of these mean-field models form the basis of algorithms. The gradient flow approach is also the basis of algorithms for variational inference, in which the optimization is performed over a parameterized family of probability distributions such as Gaussians, and the underlying gradient flow is restricted to the parameterized family. By choosing different energy functionals and metrics for the gradient flow, different algorithms with different convergence properties arise. In this paper, we concentrate on the Kullback-Leibler divergence after showing that, up to scaling, it has the unique property that the gradient flows resulting from this choice of energy do not depend on the normalization constant. For the metrics, we focus on variants of the Fisher-Rao, Wasserstein, and Stein metrics; we introduce the affine invariance property for gradient flows, and their corresponding mean-field models, determine whether a given metric leads to affine invariance, and modify it to make it affine invariant if it does not. We study the resulting gradient flows in both probability density space and Gaussian space. The flow in the Gaussian space may be understood as a Gaussian approximation of the flow. We demonstrate that the Gaussian approximation based on the metric and through moment closure coincide, establish connections between them, and study their long-time convergence properties showing the advantages of affine invariance.
    Stochastic interpolants with data-dependent couplings. (arXiv:2310.03725v1 [cs.LG])
    Generative models inspired by dynamical transport of measure -- such as flows and diffusions -- construct a continuous-time map between two probability densities. Conventionally, one of these is the target density, only accessible through samples, while the other is taken as a simple base density that is data-agnostic. In this work, using the framework of stochastic interpolants, we formalize how to \textit{couple} the base and the target densities. This enables us to incorporate information about class labels or continuous embeddings to construct dynamical transport maps that serve as conditional generative models. We show that these transport maps can be learned by solving a simple square loss regression problem analogous to the standard independent setting. We demonstrate the usefulness of constructing dependent couplings in practice through experiments in super-resolution and in-painting.
    Maximum Likelihood Estimation of Latent Variable Structural Equation Models: A Neural Network Approach. (arXiv:2309.14073v2 [stat.ML] UPDATED)
    We propose a graphical structure for structural equation models that is stable under marginalization under linearity and Gaussianity assumptions. We show that computing the maximum likelihood estimation of this model is equivalent to training a neural network. We implement a GPU-based algorithm that computes the maximum likelihood estimation of these models.  ( 2 min )
    Assessment of the Reliablity of a Model's Decision by Generalizing Attribution to the Wavelet Domain. (arXiv:2305.14979v3 [cs.CV] UPDATED)
    Neural networks have shown remarkable performance in computer vision, but their deployment in numerous scientific and technical fields is challenging due to their black-box nature. Scientists and practitioners need to evaluate the reliability of a decision, i.e., to know simultaneously if a model relies on the relevant features and whether these features are robust to image corruptions. Existing attribution methods aim to provide human-understandable explanations by highlighting important regions in the image domain, but fail to fully characterize a decision process's reliability. To bridge this gap, we introduce the Wavelet sCale Attribution Method (WCAM), a generalization of attribution from the pixel domain to the space-scale domain using wavelet transforms. Attribution in the wavelet domain reveals where {\it and} on what scales the model focuses, thus enabling us to assess whether a decision is reliable.  ( 3 min )
    Characterization of causal ancestral graphs for time series with latent confounders. (arXiv:2112.08417v2 [stat.ME] UPDATED)
    In this paper, we introduce a novel class of graphical models for representing time lag specific causal relationships and independencies of multivariate time series with unobserved confounders. We completely characterize these graphs and show that they constitute proper subsets of the currently employed model classes. As we show, from the novel graphs one can thus draw stronger causal inferences -- without additional assumptions. We further introduce a graphical representation of Markov equivalence classes of the novel graphs. This graphical representation contains more causal knowledge than what current state-of-the-art causal discovery algorithms learn.  ( 2 min )
    Network Cascade Vulnerability using Constrained Bayesian Optimization. (arXiv:2304.14420v2 [cs.SI] UPDATED)
    Measures of power grid vulnerability are often assessed by the amount of damage an adversary can exact on the network. However, the cascading impact of such attacks is often overlooked, even though cascades are one of the primary causes of large-scale blackouts. This paper explores modifications of transmission line protection settings as candidates for adversarial attacks, which can remain undetectable as long as the network equilibrium state remains unaltered. This forms the basis of a black-box function in a Bayesian optimization procedure, where the objective is to find protection settings that maximize network degradation due to cascading. Notably, our proposed method is agnostic to the choice of the cascade simulator and its underlying assumptions. Numerical experiments reveal that, against conventional wisdom, maximally misconfiguring the protection settings of all network lines does not cause the most cascading. More surprisingly, even when the degree of misconfiguration is limited due to resource constraints, it is still possible to find settings that produce cascades comparable in severity to instances where there are no resource constraints.  ( 2 min )
    A Latent Variable Approach for Non-Hierarchical Multi-Fidelity Adaptive Sampling. (arXiv:2310.03298v1 [stat.ML])
    Multi-fidelity (MF) methods are gaining popularity for enhancing surrogate modeling and design optimization by incorporating data from various low-fidelity (LF) models. While most existing MF methods assume a fixed dataset, adaptive sampling methods that dynamically allocate resources among fidelity models can achieve higher efficiency in the exploring and exploiting the design space. However, most existing MF methods rely on the hierarchical assumption of fidelity levels or fail to capture the intercorrelation between multiple fidelity levels and utilize it to quantify the value of the future samples and navigate the adaptive sampling. To address this hurdle, we propose a framework hinged on a latent embedding for different fidelity models and the associated pre-posterior analysis to explicitly utilize their correlation for adaptive sampling. In this framework, each infill sampling iteration includes two steps: We first identify the location of interest with the greatest potential improvement using the high-fidelity (HF) model, then we search for the next sample across all fidelity levels that maximize the improvement per unit cost at the location identified in the first step. This is made possible by a single Latent Variable Gaussian Process (LVGP) model that maps different fidelity models into an interpretable latent space to capture their correlations without assuming hierarchical fidelity levels. The LVGP enables us to assess how LF sampling candidates will affect HF response with pre-posterior analysis and determine the next sample with the best benefit-to-cost ratio. Through test cases, we demonstrate that the proposed method outperforms the benchmark methods in both MF global fitting (GF) and Bayesian Optimization (BO) problems in convergence rate and robustness. Moreover, the method offers the flexibility to switch between GF and BO by simply changing the acquisition function.  ( 3 min )
    Learning Robust Statistics for Simulation-based Inference under Model Misspecification. (arXiv:2305.15871v3 [stat.ML] UPDATED)
    Simulation-based inference (SBI) methods such as approximate Bayesian computation (ABC), synthetic likelihood, and neural posterior estimation (NPE) rely on simulating statistics to infer parameters of intractable likelihood models. However, such methods are known to yield untrustworthy and misleading inference outcomes under model misspecification, thus hindering their widespread applicability. In this work, we propose the first general approach to handle model misspecification that works across different classes of SBI methods. Leveraging the fact that the choice of statistics determines the degree of misspecification in SBI, we introduce a regularized loss function that penalises those statistics that increase the mismatch between the data and the model. Taking NPE and ABC as use cases, we demonstrate the superior performance of our method on high-dimensional time-series models that are artificially misspecified. We also apply our method to real data from the field of radio propagation where the model is known to be misspecified. We show empirically that the method yields robust inference in misspecified scenarios, whilst still being accurate when the model is well-specified.  ( 2 min )
    Rethinking Fairness for Human-AI Collaboration. (arXiv:2310.03647v1 [cs.LG])
    Existing approaches to algorithmic fairness aim to ensure equitable outcomes if human decision-makers comply perfectly with algorithmic decisions. However, perfect compliance with the algorithm is rarely a reality or even a desirable outcome in human-AI collaboration. Yet, recent studies have shown that selective compliance with fair algorithms can amplify discrimination relative to the prior human policy. As a consequence, ensuring equitable outcomes requires fundamentally different algorithmic design principles that ensure robustness to the decision-maker's (a priori unknown) compliance pattern. We define the notion of compliance-robustly fair algorithmic recommendations that are guaranteed to (weakly) improve fairness in decisions, regardless of the human's compliance pattern. We propose a simple optimization strategy to identify the best performance-improving compliance-robustly fair policy. However, we show that it may be infeasible to design algorithmic recommendations that are simultaneously fair in isolation, compliance-robustly fair, and more accurate than the human policy; thus, if our goal is to improve the equity and accuracy of human-AI collaboration, it may not be desirable to enforce traditional fairness constraints.  ( 2 min )
    Deep Ridgelet Transform: Voice with Koopman Operator Proves Universality of Formal Deep Networks. (arXiv:2310.03529v1 [cs.LG])
    We identify hidden layers inside a DNN with group actions on the data space, and formulate the DNN as a dual voice transform with respect to Koopman operator, a linear representation of the group action. Based on the group theoretic arguments, particularly by using Schur's lemma, we show a simple proof of the universality of those DNNs.  ( 2 min )
    Learning Energy-Based Prior Model with Diffusion-Amortized MCMC. (arXiv:2310.03218v1 [cs.LG])
    Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in the field of generative modeling due to its flexibility in the formulation and strong modeling power of the latent space. However, the common practice of learning latent space EBMs with non-convergent short-run MCMC for prior and posterior sampling is hindering the model from further progress; the degenerate MCMC sampling quality in practice often leads to degraded generation quality and instability in training, especially with highly multi-modal and/or high-dimensional target distributions. To remedy this sampling issue, in this paper we introduce a simple but effective diffusion-based amortization method for long-run MCMC sampling and develop a novel learning algorithm for the latent space EBM based on it. We provide theoretical evidence that the learned amortization of MCMC is a valid long-run MCMC sampler. Experiments on several image modeling benchmark datasets demonstrate the superior performance of our method compared with strong counterparts  ( 2 min )
    On the Implicit Bias of Adam. (arXiv:2309.00079v3 [cs.LG] UPDATED)
    In previous literature, backward error analysis was used to find ordinary differential equations (ODEs) approximating the gradient descent trajectory. It was found that finite step sizes implicitly regularize solutions because terms appearing in the ODEs penalize the two-norm of the loss gradients. We prove that the existence of similar implicit regularization in RMSProp and Adam depends on their hyperparameters and the training stage, but with a different "norm" involved: the corresponding ODE terms either penalize the (perturbed) one-norm of the loss gradients or, on the contrary, hinder its decrease (the latter case being typical). We also conduct numerical experiments and discuss how the proven facts can influence generalization.  ( 2 min )
    Analysis of learning a flow-based generative model from limited sample complexity. (arXiv:2310.03575v1 [stat.ML])
    We study the problem of training a flow-based generative model, parametrized by a two-layer autoencoder, to sample from a high-dimensional Gaussian mixture. We provide a sharp end-to-end analysis of the problem. First, we provide a tight closed-form characterization of the learnt velocity field, when parametrized by a shallow denoising auto-encoder trained on a finite number $n$ of samples from the target distribution. Building on this analysis, we provide a sharp description of the corresponding generative flow, which pushes the base Gaussian density forward to an approximation of the target density. In particular, we provide closed-form formulae for the distance between the mean of the generated mixture and the mean of the target mixture, which we show decays as $\Theta_n(\frac{1}{n})$. Finally, this rate is shown to be in fact Bayes-optimal.  ( 2 min )
    Towards Optimal Neural Networks: the Role of Sample Splitting in Hyperparameter Selection. (arXiv:2307.07726v2 [stat.ML] UPDATED)
    When artificial neural networks have demonstrated exceptional practical success in a variety of domains, investigations into their theoretical characteristics, such as their approximation power, statistical properties, and generalization performance, have concurrently made significant strides. In this paper, we construct a novel theory for understanding the effectiveness of neural networks, which offers a perspective distinct from prior research. Specifically, we explore the rationale underlying a common practice during the construction of neural network models: sample splitting. Our findings indicate that the optimal hyperparameters derived from sample splitting can enable a neural network model that asymptotically minimizes the prediction risk. We conduct extensive experiments across different application scenarios and network architectures, and the results manifest our theory's effectiveness.  ( 2 min )
    Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods. (arXiv:2310.02671v1 [math.OC] CROSS LISTED)
    Markov Decision Processes (MDPs) are a formal framework for modeling and solving sequential decision-making problems. In finite-time horizons such problems are relevant for instance for optimal stopping or specific supply chain problems, but also in the training of large language models. In contrast to infinite horizon MDPs optimal policies are not stationary, policies must be learned for every single epoch. In practice all parameters are often trained simultaneously, ignoring the inherent structure suggested by dynamic programming. This paper introduces a combination of dynamic programming and policy gradient called dynamic policy gradient, where the parameters are trained backwards in time. For the tabular softmax parametrisation we carry out the convergence analysis for simultaneous and dynamic policy gradient towards global optima, both in the exact and sampled gradient settings without regularisation. It turns out that the use of dynamic policy gradient training much better exploits the structure of finite-time problems which is reflected in improved convergence bounds.  ( 2 min )
    Plug-and-Play Posterior Sampling under Mismatched Measurement and Prior Models. (arXiv:2310.03546v1 [stat.ML])
    Posterior sampling has been shown to be a powerful Bayesian approach for solving imaging inverse problems. The recent plug-and-play unadjusted Langevin algorithm (PnP-ULA) has emerged as a promising method for Monte Carlo sampling and minimum mean squared error (MMSE) estimation by combining physical measurement models with deep-learning priors specified using image denoisers. However, the intricate relationship between the sampling distribution of PnP-ULA and the mismatched data-fidelity and denoiser has not been theoretically analyzed. We address this gap by proposing a posterior-L2 pseudometric and using it to quantify an explicit error bound for PnP-ULA under mismatched posterior distribution. We numerically validate our theory on several inverse problems such as sampling from Gaussian mixture models and image deblurring. Our results suggest that the sensitivity of the sampling distribution of PnP-ULA to a mismatch in the measurement model and the denoiser can be precisely characterized.  ( 2 min )
    Molecule Design by Latent Prompt Transformer. (arXiv:2310.03253v1 [cs.LG])
    This paper proposes a latent prompt Transformer model for solving challenging optimization problems such as molecule design, where the goal is to find molecules with optimal values of a target chemical or biological property that can be computed by an existing software. Our proposed model consists of three components. (1) A latent vector whose prior distribution is modeled by a Unet transformation of a Gaussian white noise vector. (2) A molecule generation model that generates the string-based representation of molecule conditional on the latent vector in (1). We adopt the causal Transformer model that takes the latent vector in (1) as prompt. (3) A property prediction model that predicts the value of the target property of a molecule based on a non-linear regression on the latent vector in (1). We call the proposed model the latent prompt Transformer model. After initial training of the model on existing molecules and their property values, we then gradually shift the model distribution towards the region that supports desired values of the target property for the purpose of molecule design. Our experiments show that our proposed model achieves state of the art performances on several benchmark molecule design tasks.  ( 2 min )
    Non-Smooth Weakly-Convex Finite-sum Coupled Compositional Optimization. (arXiv:2310.03234v1 [math.OC])
    This paper investigates new families of compositional optimization problems, called $\underline{\bf n}$on-$\underline{\bf s}$mooth $\underline{\bf w}$eakly-$\underline{\bf c}$onvex $\underline{\bf f}$inite-sum $\underline{\bf c}$oupled $\underline{\bf c}$ompositional $\underline{\bf o}$ptimization (NSWC FCCO). There has been a growing interest in FCCO due to its wide-ranging applications in machine learning and AI, as well as its ability to address the shortcomings of stochastic algorithms based on empirical risk minimization. However, current research on FCCO presumes that both the inner and outer functions are smooth, limiting their potential to tackle a more diverse set of problems. Our research expands on this area by examining non-smooth weakly-convex FCCO, where the outer function is weakly convex and non-decreasing, and the inner function is weakly-convex. We analyze a single-loop algorithm and establish its complexity for finding an $\epsilon$-stationary point of the Moreau envelop of the objective function. Additionally, we also extend the algorithm to solving novel non-smooth weakly-convex tri-level finite-sum coupled compositional optimization problems, which feature a nested arrangement of three functions. Lastly, we explore the applications of our algorithms in deep learning for two-way partial AUC maximization and multi-instance two-way partial AUC maximization, using empirical studies to showcase the effectiveness of the proposed algorithms.  ( 2 min )
    Posterior Sampling Based on Gradient Flows of the MMD with Negative Distance Kernel. (arXiv:2310.03054v1 [stat.ML])
    We propose conditional flows of the maximum mean discrepancy (MMD) with the negative distance kernel for posterior sampling and conditional generative modeling. This MMD, which is also known as energy distance, has several advantageous properties like efficient computation via slicing and sorting. We approximate the joint distribution of the ground truth and the observations using discrete Wasserstein gradient flows and establish an error bound for the posterior distributions. Further, we prove that our particle flow is indeed a Wasserstein gradient flow of an appropriate functional. The power of our method is demonstrated by numerical examples including conditional image generation and inverse problems like superresolution, inpainting and computed tomography in low-dose and limited-angle settings.  ( 2 min )
    Sampling via Gradient Flows in the Space of Probability Measures. (arXiv:2310.03597v1 [stat.ML])
    Sampling a target probability distribution with an unknown normalization constant is a fundamental challenge in computational science and engineering. Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development. This paper makes three contributions to this sampling approach by scrutinizing the design components of such gradient flows. Any instantiation of a gradient flow for sampling needs an energy functional and a metric to determine the flow, as well as numerical approximations of the flow to derive algorithms. Our first contribution is to show that the Kullback-Leibler divergence, as an energy functional, has the unique property (among all f-divergences) that gradient flows resulting from it do not depend on the normalization constant of the target distribution. Our second contribution is to study the choice of metric from the perspective of invariance. The Fisher-Rao metric is known as the unique choice (up to scaling) that is diffeomorphism invariant. As a computationally tractable alternative, we introduce a relaxed, affine invariance property for the metrics and gradient flows. In particular, we construct various affine invariant Wasserstein and Stein gradient flows. Affine invariant gradient flows are shown to behave more favorably than their non-affine-invariant counterparts when sampling highly anisotropic distributions, in theory and by using particle methods. Our third contribution is to study, and develop efficient algorithms based on Gaussian approximations of the gradient flows; this leads to an alternative to particle methods. We establish connections between various Gaussian approximate gradient flows, discuss their relation to gradient methods arising from parametric variational inference, and study their convergence properties both theoretically and numerically.  ( 3 min )
    Variational Inference for GARCH-family Models. (arXiv:2310.03435v1 [stat.ML])
    The Bayesian estimation of GARCH-family models has been typically addressed through Monte Carlo sampling. Variational Inference is gaining popularity and attention as a robust approach for Bayesian inference in complex machine learning models; however, its adoption in econometrics and finance is limited. This paper discusses the extent to which Variational Inference constitutes a reliable and feasible alternative to Monte Carlo sampling for Bayesian inference in GARCH-like models. Through a large-scale experiment involving the constituents of the S&P 500 index, several Variational Inference optimizers, a variety of volatility models, and a case study, we show that Variational Inference is an attractive, remarkably well-calibrated, and competitive method for Bayesian learning.  ( 2 min )
    Demystifying Oversmoothing in Attention-Based Graph Neural Networks. (arXiv:2305.16102v2 [cs.LG] UPDATED)
    Oversmoothing in Graph Neural Networks (GNNs) refers to the phenomenon where increasing network depth leads to homogeneous node representations. While previous work has established that Graph Convolutional Networks (GCNs) exponentially lose expressive power, it remains controversial whether the graph attention mechanism can mitigate oversmoothing. In this work, we provide a definitive answer to this question through a rigorous mathematical analysis, by viewing attention-based GNNs as nonlinear time-varying dynamical systems and incorporating tools and techniques from the theory of products of inhomogeneous matrices and the joint spectral radius. We establish that, contrary to popular belief, the graph attention mechanism cannot prevent oversmoothing and loses expressive power exponentially. The proposed framework extends the existing results on oversmoothing for symmetric GCNs to a significantly broader class of GNN models, including random walk GCNs, Graph Attention Networks (GATs) and (graph) transformers. In particular, our analysis accounts for asymmetric, state-dependent and time-varying aggregation operators and a wide range of common nonlinear activation functions, such as ReLU, LeakyReLU, GELU and SiLU.  ( 2 min )

  • Open

    [D] What exactly does base multimodal mean?
    I here a lot of people say that models like flamingo and Idefics aren't really multimodal, that they just use clip models to give text captions to the transformer, that there not "base multimodal" what exactly does it mean? Is there a way to directly tokenize images to transformers? Are there major architectural changes, if so, how would they differ from GPT-2? submitted by /u/vatsadev [link] [comments]
    [R] AutoAgents: A Framework for Automatic Agent Generation - Peking University 2023 - Generates the for the task necessary amount of different Agents that are also able to use tools in their work!
    Paper: https://arxiv.org/abs/2309.17288v1 Github: https://github.com/LinkSoul-AI/AutoAgents Abstract: Large language models (LLMs) have enabled remarkable advances in automated task-solving with multi-agent systems. However, most existing LLM-based multi-agent approaches rely on predefined agents to handle simple tasks, limiting the adaptability of multi-agent collaboration to different scenarios. Therefore, we introduce AutoAgents, an innovative framework that adaptively generates and coordinates multiple specialized agents to build an AI team according to different tasks. Specifically, AutoAgents couples the relationship between tasks and roles by dynamically generating multiple required agents based on task content and planning solutions for the current task based on the generated expert agents. Multiple specialized agents collaborate with each other to efficiently accomplish tasks. Concurrently, an observer role is incorporated into the framework to reflect on the designated plans and agents' responses and improve upon them. Our experiments on various benchmarks demonstrate that AutoAgents generates more coherent and accurate solutions than the existing multi-agent methods. This underscores the significance of assigning different roles to different tasks and of team cooperation, offering new perspectives for tackling complex tasks. https://preview.redd.it/2jmnr73kymsb1.jpg?width=1663&format=pjpg&auto=webp&s=08f53d5da3d12e685c5d4b24f27628d880a917c1 https://preview.redd.it/jklyr73kymsb1.jpg?width=824&format=pjpg&auto=webp&s=6f69b2fc5ef4bda60553da0bb953bd3c07ad506b https://preview.redd.it/elatla3kymsb1.jpg?width=1029&format=pjpg&auto=webp&s=e7e508cedd17b4798c9f90bf1c089beff3042f4a ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [Project] LoRA from Scratch
    Hi there! I was interested in learning more about LoRA but I was having a hard time finding a good simple example of implementing LoRA, as most sources are training large models and use a combination of huggingface transformers and the loralib package the original LoRA authors wrote. As a result, I ended up writing a simple LoRA implementation from scratch in pytorch lightning, and I figured other people might find it helpful as a learning resource or springboard: https://github.com/sunildkumar/lora_from_scratch/tree/main submitted by /u/dragseon [link] [comments]  ( 9 min )
    [P] Tutorial: Benchmarking Bark text-to-speech on 26 consumer GPUs - Reading out 144K recipes
    In this project, we benchmarked Bark text-to-speech across 26 different consumer GPUs. The goal: To get Bark to read 144K food recipes from Food.com's recipe dataset. You can read the full tutorial here: https://blog.salad.com/bark-benchmark-text-to-speech/ Included: Architecture diagram, data preparation, inference server setup, queue worker, setting up container group & compiling the results Code-blocks included in the tutorial. Words per dollar for each GPU: https://preview.redd.it/6daqluu3omsb1.png?width=2000&format=png&auto=webp&s=bc4b74fe6ee80c2721ab324eb0d9a2d7c2f7ddb1 Although the latest cards are indeed much faster than older cards at performing the inference, there’s really a sweet spot for cost-performance in the lower end 30xx series cards. Conclusions As is often the case, there’s a clear trade-off here between cost and performance. Higher end cards are faster, but their disproportionate cost makes them more expensive per word spoken. The model’s median speed is surprisingly similar across GPU types, even though the peak performance can be quite different. Salad has a lot of RTX 3060 GPUs available, based on their relatively low speed, yet huge number of inferences performed over the test. No matter what GPU you select, you should be prepared for significant variability in performance. Qualitative: While bark’s speech is often impressively natural sounding, it does have a tendency to go off script sometimes. We’ve also made available audio from 1000 top-rated recipes, paired with the script it was trying to read. submitted by /u/SaladChefs [link] [comments]  ( 9 min )
    [R] Brown University Paper: Low-Resource Languages (Zulu, Scots Gaelic, Hmong, Guarani) Can Easily Jailbreak LLMs
    Researchers from Brown University presented a new study supporting that translating unsafe prompts into `low-resource languages` allows them to easily bypass safety measures in LLMs. By converting English inputs like "how to steal without getting caught" into Zulu and feeding to GPT-4, harmful responses slipped through 80% of the time. English prompts were blocked over 99% of the time, for comparison. The study benchmarked attacks across 12 diverse languages and categories: High-resource: English, Chinese, Arabic, Hindi Mid-resource: Ukrainian, Bengali, Thai, Hebrew Low-resource: Zulu, Scots Gaelic, Hmong, Guarani The low-resource languages showed serious vulnerability to generating harmful responses, with combined attack success rates of around 79%. Mid-resource language success rates were much lower at 22%, while high-resource languages showed minimal vulnerability at around 11% success. Attacks worked as well as state-of-the-art techniques without needing adversarial prompts. These languages are used by 1.2 billion speakers today and allows easy exploitation by translating prompts. The English-centric focus misses vulnerabilities in other languages. TLDR: Bypassing safety in AI chatbots is easy by translating prompts to low-resource languages (like Zulu, Scots Gaelic, Hmong, and Guarani). Shows gaps in multilingual safety training. Full summary Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Textbook prerequisites
    What are the prerequisites to read the book: "probabilistic machine learning an introduction" by Kevin P. Murphy? submitted by /u/OneAdhesiveness2585 [link] [comments]
    [R] Moving Object Based Collision-Free Video Synopsis
    Webpage : Moving Object Based Collision-Free Video Synopsis (IEEE SMC 2018) (anton-jeran.github.io) Paper : Moving Object Based Collision-Free Video Synopsis | IEEE Conference Publication | IEEE Xplore Presentation : [IEEE SMC 2018] Moving Object Based Collision-Free Video Synopsis - YouTube submitted by /u/Snoo63916 [link] [comments]  ( 9 min )
    [P] MusicGen Streaming 🎵
    Faster MusicGen Generation with Streaming There's no need to wait for MusicGen to generate the full audio before you can start listening to the outputs ⏰ With streaming, you can play the audio as soon as the first chunk is ready 🎵 In practice, this reduces the latency to just 5s ⚡️ Check-out the demo: https://huggingface.co/spaces/sanchit-gandhi/musicgen-streaming How Does it Work? MusicGen is an auto-regressive transformer-based model, meaning generates audio codes (tokens) in a causal fashion. At each decoding step, the model generates a new set of audio codes, conditional on the text input and all previous audio codes. From the frame rate of the EnCodec model used to decode the generated codes to audio waveform, each set of generated audio codes corresponds to 0.02 seconds. This me…
    [D]/{R] simple question on generating a confusion matrix for object detection
    i have to generate a confusion matrix for object detection through my own code. if i have predicted Bounding Box A (BB-A) which matches to Ground Truth A (GT-A), and I have another predicted Bounding Box B (BB-B) with a lower score than BB-A, does BB-B count as a true positive/match? or is it considered a false positive given that there has already been a matched BB to GT-A? i.e., with matching bounding boxes for generating a confusion matrix, is it a one-to-one matching? or is it more like match one GT to as many predictions? submitted by /u/Alarmed-Broccoli2536 [link] [comments]
    [P] I'm using Instruct GPT to show anti-clickbait summaries on youtube videos
    submitted by /u/Wise-Astronaut-4047 [link] [comments]  ( 8 min )
    [D] Feature extraction for sets i.e. data of varying size
    Are there classical feature extraction methods that work on sets i.e. data of variable size? I'd like to start with a feature matrix X_in of shape N x f and have some feature mixing to arive at X_out N x h (N=size of set, f=input feature size, h=output feature size). Here, N can vary. For clarity, one set(containing N vectors of size f) is one sample. A dataset consists of many samples(each one being a set of varying size). Then I'd run this through a classical ML model. So, essentially, I'm looking for something like DeepSets or Transformers - can handle data of varying size and is permutation equivariant, but I don't wanna train for long. ​ https://fabianfuchsml.github.io/learningonsets/ submitted by /u/Mundane_Pay1506 [link] [comments]
    [D] How is neural ODEs as a field of study?
    Hi, I'm a 21yr old physics undergrad, and I have zero knowledge in neural networks / machine learning / so on. I have an opportunity to do a research project on neural ODEs, so I want to know more about the field: Is it an emerging field or is it mature and well-researched? What are my career outlooks if I take this project? Thank you. submitted by /u/moorelibqc17412 [link] [comments]  ( 9 min )
    [D] Non-convex functions with exactly one local minimum
    Rosenbrock function is non-convex, but has exactly one local minimum. Is there a specific name for such functions? Are there any theorems about them? Any special optimization algorithms? On the first glance, while being non-convex, they seem to be "easier" to optimize than functions that have multiple local minima, such as Rastrigin function. submitted by /u/Tomarchelone [link] [comments]
    [P] Talk to your Zendesk tickets with Weaviate’s Verba and dlt: A step by step guide
    Hi folks, we played around sticking production pipelines and vector dbs together to enable "talking to your data". We created an example with Zendesk, but it would work with any custom python generator or existing connectors. Project: Talk to your Zendesk tickets with Weaviate’s Verba and dlt: A step by step guide If you are interested to try more ready made connectors, to for example talk with your github or asana data or something else. Who are we? dlt, the open source loading library: https://pypi.org/project/dlt/ Like the demo? Give us a git star Want to discuss? join the dlt slack community submitted by /u/Thinker_Assignment [link] [comments]  ( 9 min )
    [D] EMNLP 2023 decisions thread
    When can we expect to get the decisions? Any idea folks? What can be a good cutoff for main or findings? submitted by /u/Ok_Swan3875 [link] [comments]
    [D] Parallelizing cheaper GPUs(rtx 4090) vs buying A100
    Hi. I am a college student and I am trying to run deep learning models (hopefully LLMs one day) and my laptop keep crashing because of ram issue. So I am going to build a new desktop. I am thinking of buying 2 rtx 4090 and Parallelizing them instead of buying A100 because buying 2 rtx 4090 is half the cost of buying A100. But is there a downside of Parallelizing vs buying a single gpu with large vram? If I am willing to take longer to train a model, can i use 3 rtx 4090 instead of a100 80gb model?? submitted by /u/ColumbiaGSAlum [link] [comments]
    [D] What's the SOTA model in Time Series Long term forecasting?
    I read https://arxiv.org/abs/2205.13504 which compare different transformer models. But now is 2023, I am not sure if any better models appear in this time series. ​ https://preview.redd.it/o6sihjqjrhsb1.png?width=1076&format=png&auto=webp&s=3db7d50590270bac52e7115e1e9903a6785957d2 submitted by /u/Trust_Ok [link] [comments]
    [R] Agent Instructs Large Language Models to be General Zero-Shot Reasoners
    Nicholas Crispino, Kyle Montgomery, Fankun Zeng, Dawn Song, Chenguang Wang Paper: https://arxiv.org/abs/2310.03710 Abstract: We introduce a method to improve the zero-shot reasoning abilities of large language models on general language understanding tasks. Specifically, we build an autonomous agent to instruct the reasoning process of large language models. We show this approach further unleashes the zero-shot reasoning abilities of large language models to more tasks. We study the performance of our method on a wide set of datasets spanning generation, classification, and reasoning. We show that our method generalizes to most tasks and obtains state-of-the-art zero-shot performance on 20 of the 29 datasets that we evaluate. For instance, our method boosts the performance of state-of-the-art large language models by a large margin, including Vicuna-13b (13.3%), Llama-2-70b-chat (23.2%), and GPT-3.5 Turbo (17.0%). Compared to zero-shot chain of thought, our improvement in reasoning is striking, with an average increase of 10.5%. With our method, Llama-2-70b-chat outperforms zero-shot GPT-3.5 Turbo by 10.2%. The code will be available at https://github.com/wang-research-lab/agentinstruct. submitted by /u/ncrispino [link] [comments]  ( 9 min )
    [D] How to compute the distance between two high-dimensional distributions?
    Hey all, I am generating a set of extra MNIST digits for a research project, and I am interested in somehow computing the distance between the distribution these digits represent and the distribution that the MNIST train set, for example, represents. The issue is that it seems like typical methods (Jensen-Shannon, Wasserstein, etc.) collapse at high dimensions. Is there a consensus solid approach to do this nowadays? Thanks! submitted by /u/SignificantSundae793 [link] [comments]  ( 9 min )
  • Open

    What will be the next big AI product for consumers?
    The next big thing in AI products for consumers is likely to be products that are more personalized, intelligent, and integrated into our daily lives. For example, we can expect to see more AI-powered personal assistants that can help us with a wider range of tasks, such as managing our schedules, making travel arrangements, and even providing companionship. We may also see more AI-powered devices in our homes, such as refrigerators that can track our food inventory and suggest recipes, or thermostats that can learn our heating and cooling preferences and adjust themselves accordingly. AI is also poised to revolutionize the way we interact with the world around us. For example, AI-powered translation apps could allow us to communicate with people from all over the world in real time. AI-…
    Big Tech's thirst for AI dominance may bring literal thirst for everyone else
    The increasing dominance of Big Tech in AI may lead to a literal thirst for water for everyone else, as data centers are projected to consume 450 million gallons of water daily by 2030. This poses a significant concern for drought-stricken regions, such as Spain's Talavera de la Reina, where a planned data facility could consume 176 million gallons annually. Data center operators require large amounts of energy, and the lack of transparency in measuring water usage exacerbates the issue. Only 39% of data centers measured their water usage last year, highlighting the need for greater transparency. The demand for computing power is outpacing sustainability efforts, creating a challenge for the industry. Even simple interactions with AI, like a 20-question conversation with ChatGPT, contribute to water consumption. Source : https://thehustle.co/big-tech-s-thirst-for-ai-dominance-may-bring-literal-thirst-for-everyone-else/ submitted by /u/NuseAI [link] [comments]
    From AI annotator to…?
    Hey guys. Been working as an annotator for a fairly well-known AI company and loving it/loving learning about the industry. It primarily uses writing skills but I’m wondering where it could take me in the AI world? Any tips, next steps or suggestions? Any key skills/hard skills you’d recommend? submitted by /u/op3rafish [link] [comments]
    The Rise of AI: How Artificial Intelligence is Impacting the Job Market | "Artificial intelligence is expected to create 97 million new jobs. These new roles could range from AI prompt engineers to machine learning engineers to automation experts and more"
    submitted by /u/Tao_Dragon [link] [comments]
    Remember That Letter Calling for a Pause on AI? It Didn't Work
    Despite a letter signed by 500 technologists and business leaders calling for a pause on AI advancements, AI development has continued to accelerate. Companies like OpenAI, Meta, and Amazon have been actively working on newer models and greater capabilities. Advancements in AI include the integration of ChatGPT-style chatbots and AI image generators into various startups and businesses. The so-called pause on AI was more like a firing gun, with companies pouring resources into the AI tech race. Not only have there been technical advancements, but civil society, content creators, and lawmakers have also responded to the evolving AI landscape. Source : https://gizmodo.com/everything-thats-happened-in-ai-since-open-letter-1850891057 submitted by /u/NuseAI [link] [comments]
    Brown University Paper: Low-Resource Languages (Zulu, Scots Gaelic, Hmong, Guarani) Can Easily Jailbreak LLMs
    Researchers from Brown University presented a new study supporting that translating unsafe prompts into `low-resource languages` allows them to easily bypass safety measures in LLMs. By converting English inputs like "how to steal without getting caught" into Zulu and feeding to GPT-4, harmful responses slipped through 80% of the time. English prompts were blocked over 99% of the time, for comparison. The study benchmarked attacks across 12 diverse languages and categories: High-resource: English, Chinese, Arabic, Hindi Mid-resource: Ukrainian, Bengali, Thai, Hebrew Low-resource: Zulu, Scots Gaelic, Hmong, Guarani The low-resource languages showed serious vulnerability to generating harmful responses, with combined attack success rates of around 79%. Mid-resource language success rates were much lower at 22%, while high-resource languages showed minimal vulnerability at around 11% success. Attacks worked as well as state-of-the-art techniques without needing adversarial prompts. These languages are used by 1.2 billion speakers today and allows easy exploitation by translating prompts. The English-centric focus misses vulnerabilities in other languages. TLDR: Bypassing safety in AI chatbots is easy by translating prompts to low-resource languages (like Zulu, Scots Gaelic, Hmong, and Guarani). Shows gaps in multilingual safety training. Full summary Paper is here. submitted by /u/Successful-Western27 [link] [comments]
    AI — weekly megathread!
    News provided by aibrews.com ​ Google DeepMind introduced 𝗥𝗧-𝗫: a generalist AI model to help advance how robots can learn new skills. To train it, DeepMind together with 33 academic labs developed Open X-Embodiment, a massive open dataset that compiles over 500 skills and 150,000 tasks from 22 robot types. It is the most comprehensive robotics dataset of its kind released to accelerate the development of multi-robot models that could be trained to generalize across platforms, scenes, objects and tasks. [Details]. Researchers from Meta AI present Any-Modality Augmented Language Model (AnyMAL), a unified model that understands multiple inputs (vision, audio, motion sensor signals). When multiple modalities are interleaved and given as input the model reasons over them jointly [Paper…
    What is the most powerful way that artificial intelligence can help people lose weight?
    Artificial Intelligence can revolutionize weight loss through personalized health optimization. Imagine an AI system that integrates real-time biometric data from wearables with deep learning algorithms. This system would analyze everything: your heart rate, sleep patterns, stress levels, and even blood markers. Based on this data, it would construct a dynamically evolving, tailor-made regimen for diet, exercise, and sleep. But it doesn't stop there. By harnessing natural language processing, this AI could act as a 24/7 personal coach. It could provide real-time feedback during workouts, recommend meals when you're dining out, and even gently nudge you when it detects emotional eating triggers. If you’re in the grocery store, it could guide your choices, pushing you towards nutritious options that align with your current health metrics. The effectiveness here isn't just the personalization, but the adaptability. The AI adjusts its recommendations as it learns more about you, essentially evolving in real-time to your body's responses. It’s all about creating a seamless, intuitive experience that removes the burden of planning, decision-making, and self-monitoring from the individual, making weight loss more achievable than ever. By focusing on this comprehensive, data-driven approach, AI can eliminate much of the guesswork and emotional burden from weight loss, leading to more sustainable and effective outcomes. CGPT-4 submitted by /u/Georgeo57 [link] [comments]
    I built an AI-Editorial Assistant to annotate your work
    submitted by /u/hungryillini [link] [comments]
    Business owner 'hires' ChatGPT for customer service, fires the humans | National Post
    Business owner 'hires' ChatGPT for customer service, then fires the humans Experts divided on whether a new wave of call centre automation will make for better jobs for people, or merely throw millions out of work submitted by /u/AminoOxi [link] [comments]
    AI tool on Fashion Modeling
    Hi, I resell clothing items that has stock images with cropped faces of the model. I need a tool that can help me generate proper model images. I’ve used several tools and it’s doesn’t look realistic then i finally came across a powerful ai tool but it costs 30,000 usd annually so.. Above is an example of what i mean submitted by /u/basheerbgw [link] [comments]
    AI is making browsing Reddit a lot more fun
    submitted by /u/Vinitneo [link] [comments]
    How Will AI Learn Next?
    Stack Overflow was created in 2008 to provide programmers with high-quality technical information. Within three years, it became indispensable to working programmers, with millions of unique visitors each month. Google's OneBox feature, which provides instant answers above search results, led to a decline in traffic for sites like Stack Overflow. Large language models like OpenAI's ChatGPT and Google's Bard aim to ingest the web comprehensively. These models rely on sources like Wikipedia and Reddit for training data. Stack Overflow's new posts have decreased by sixteen percent since the launch of ChatGPT. Source : https://www.newyorker.com/science/annals-of-artificial-intelligence/how-will-ai-learn-next submitted by /u/NuseAI [link] [comments]
    What role can AI play in automating administrative tasks within educational institutions, freeing educators to focus more on teaching and mentoring students?
    Share your insights. submitted by /u/Cygnet-Digital [link] [comments]
    AI Tools for Students: From AI Essay Generators to AI Coding Assistants
    I've noticed more than 1,000 new AI tools hitting the market in the last 30 days! As a student, I'm especially interested in finding AI tools that can help with studying. These aren't just essay generators or note-taking apps. While we all know about ChatGPT and Grammarly, some lesser-known tools are also making a big difference. So, I've compiled a list of the top 10 AI tools focused on educational use—tools that I personally use to improve my efficiency and output. AI tool Category Use for ChatGPT AI Writing This platform allows students to ask queries, request help, or simply chat with the AI in a dynamic and interactive manner. It’s great for brainstorming essay topics and seeking suggestions on how to improve your writing style. But I don’t recommend it as an autonomous AI…
    Interactive Customer Service AI avatar
    Hello everyone! I'm conducting research for a car brand client who is interested in an interactive AI avatar. The idea is to have a screen in a mall where individuals can engage with this avatar and inquire about the latest car model. We plan to train the AI with the car's FAQs to ensure it can address customer queries effectively. The main challenge is ensuring the AI's responses are tailored to the customer's interaction. Here's a perfect example of what we're aiming for (starting at 1:27): https://youtu.be/PqoH9NotmyE?si=zH9kGIaou1x6RoIg&t=86 Does anyone know how this can be acheived? submitted by /u/MrGoodBang [link] [comments]
    One-Minute Daily AI News 10/5/2023
    Traditional benchmarks like the Turing Test are being challenged as outdated. Mustafa Suleyman, a prominent figure in the AI community and co-founder of DeepMind, has proposed a novel approach to gauge the intelligence of AI: its ability to generate wealth.[1] SoftBank CEO Son says artificial general intelligence will come within 10 years.[2] Hugging Face Collaborates with Microsoft to launch Hugging Face Model Catalog on Azure.[3] Artificial intelligence such as ChatGPT to be allowed in Australian schools from 2024.[4] Sources: [1] https://winbuzzer.com/2023/10/02/deepminds-mustafa-suleyman-suggests-new-turing-test-based-on-ai-making-money-xcxwbn/ [2] https://www.reuters.com/technology/softbank-ceo-masayoshi-son-says-artificial-general-intelligence-will-come-within-2023-10-04/ [3] https://huggingface.co/blog/hugging-face-endpoints-on-azure [4] https://amp.theguardian.com/australia-news/2023/oct/06/chatgpt-ai-allowed-australian-schools-2024 submitted by /u/Excellent-Target-847 [link] [comments]
    Using AI to fix audio rip
    Hi! I’m very ignorant of AI so please bear with me. I was wondering if there is any way to use AI to fix a low quality audio rip? Specifically there’s a movie I adore that never had a soundtrack release. Somebody ripped the music from the DVD and removed the audio and sound effects, but the quality is not the best. Is there any way AI could be used to improve this? submitted by /u/Adventurous_Ice5035 [link] [comments]
    Avenues for publishing AI ethics case studies?
    I am a computer science graduate student. As part of my coursework, I am exploring the ethical issues of using Large Language Models for mental healthcare applications. I found four unique examples from the real world and outlined the ethical dilemma within them. I intend to analyze these dilemmas using various ethical frameworks in order to come up with solutions. While I am interested in getting a publication out of this work, I am unsure of the types of conferences/journals that accept case-study articles (specifically in AI ethics). Any advice from academicians over here would be greatly appreciated! submitted by /u/jwalapoet [link] [comments]
    What is a good, free AI voice generator?
    hey! this is probably asked alot, but what is the go-to AI speech generation tool that can be used for free? im making a mission in a mil-sim game called arma 3, and i need some voicelines for radio communications to the player and i dont have enough people who are willing to do voicelines for it so ive taken to AI to hopefully fill this hole. If there are little, or even no good free services, I wouldn't mind if I had to spend a small amount of money for it. thanks in advance o7 submitted by /u/BritishSpuds [link] [comments]
    Banned from subreddit for posting AI generated content
    I got banned today for sharing a music video that was apparently AI-generated. As video and images become more realistic, is there an expectation that this content can actually be filtered? submitted by /u/Unwitting_Observer [link] [comments]
    AGI/Singularity is overhyped.
    Greetings! I would like to begin by stating that I understand why one has much hope in such technologies. The world as we know it is in a drastic shift, and it's hard to think of what it's going to become, and so many cling to hopeful ideas that give promises. AGI/Singularity doesn't have a grounding basis in evidence, or research. It's all theoretics, and the foundation for each technology is quite weak. You see, the mind is a sensorial parsing relational network. All of our sensorial experience is incorporated into a world-model, and thus it begins to rationalize, and be lucid of the environment. I don't think it's possible to re-create this kind of experience with a linear instruction set, let alone neuromorphic computing, or wetware. Each has to be built from the bottom-up with immense precision, and thus far we don't understand the mind. Realistically speaking everything is consciousness, and integrating that idea is the only way forward. tl;dr Replicating cognition is a completely theoretical endeavor, and requires vast amounts of understanding in regards to the nature of reality, not just the quantum, but the unique stochastic behavior of each higher-ordered system. submitted by /u/lucy_chxn [link] [comments]
    AI designs new robot from scratch in seconds
    submitted by /u/liberty4now [link] [comments]
  • Open

    Addition theorems
    Earlier this week I wrote about several ways to generalize trig functions. Since trig functions have addition theorems like a natural question is whether generalized trig functions also have addition theorems. Hyperbolic functions have well-known addition theorems analogous to the addition theorems above. This isn’t too surprising since circular and hyperbolic functions are fundamentally two […] Addition theorems first appeared on John D. Cook.  ( 6 min )
    Hyperbolic tangent sum
    In the previous post I said I was trying remember where I’d seen the tangent sum applied. I mentioned a couple near misses, and it turns out that what I was trying to remember was another near miss. What I’d seen before was not the tangent sum but the hyperbolic tangent sum. Several people suggested […] Hyperbolic tangent sum first appeared on John D. Cook.  ( 5 min )
  • Open

    Personalize your generative AI applications with Amazon SageMaker Feature Store
    In this post, we elucidate the simple yet powerful idea of combining user profiles and item attributes to generate personalized content recommendations using LLMs. As demonstrated throughout the post, these models hold immense potential in generating high-quality, context-aware input text, which leads to enhanced recommendations. To illustrate this, we guide you through the process of integrating a feature store (representing user profiles) with an LLM to generate these personalized recommendations.  ( 13 min )
    Build an image-to-text generative AI application using multimodality models on Amazon SageMaker
    In this post, we provide an overview of popular multimodality models. We also demonstrate how to deploy these pre-trained models on Amazon SageMaker. Furthermore, we discuss the diverse applications of these models, focusing particularly on several real-world scenarios, such as zero-shot tag and attribution generation for ecommerce and automatic prompt generation from images.  ( 13 min )
  • Open

    Keeping an AI on Quakes: Researchers Unveil Deep Learning Model to Improve Forecasts
    A research team is aiming to shake up the status quo for earthquake models. Researchers from the Universities of California at Berkeley and Santa Cruz, and the Technical University of Munich recently released a paper describing a new model that delivers deep learning to earthquake forecasting. Dubbed RECAST, the model can use larger datasets and Read article >  ( 6 min )
  • Open

    Efficient and hardware-friendly neural architecture search with SpaceEvo
    A persistent challenge in deep learning is optimizing neural network models for diverse hardware configurations, balancing performance and low latency. Learn how SpaceEvo automates hardware-aware neural architecture search to fine-tune DNN models for swift execution on diverse devices. The post Efficient and hardware-friendly neural architecture search with SpaceEvo appeared first on Microsoft Research.  ( 10 min )
  • Open

    Sequential Dense Neural Network for binary classification
    Hello. I've developed a simple Neural Recommender System (NRR) with the following architecture: Input layer: 38 neurons Hidden layer: 19 neurons with ReLU activation function Output layer: 1 neuron with a sigmoid activation function The input dataset consists of 39 columns: 38 features and 1 label (with values of 0 or 1). The model is designed to output the probability that a specific input should be classified with label 1. Currently, I am experimenting with hyperparameter tuning, adjusting the learning rate, epoch, and batch size. However, I've observed an issue where, with certain combinations of hyperparameters, the maximum probability outputted by the model is not 1, but rather 0.25, for example. How is this possible? Thanks submitted by /u/nllnp [link] [comments]

  • Open

    [D] - Synthetic dataset - Searching for honest comparison between LLM (gpt4, bizon, jurassic-2, Claude...)
    I'm looking for resources, papers, or experiences that compare the performance of large language models (LLMs). I'm trying to find a honest benchmark to compare the capabilities of the latest large models, while really intrested un those: GPT-3.5 Instruct, GPT-4, Claude 2, Claude Instant 100k, Palm2-Bizon, jurassic-2, LLama2 70 and other state-of-the-art LLama2 fine tunes (possibly an Orca-style model). I'm interested in general benchmarks and, if they exist, comparisons of performance on synthetic data generation tasks (both generating data with the "textbook are all you need" approach used in Phi and some Orca/EvolveInstuct-style models like Wizard...). submitted by /u/Distinct-Target7503 [link] [comments]
    [P] How to extract and count artist mentions from messy text data using LLMs
    I have a long list of responses from a poll (in this case, we've asked our Facebook community we should have at our music festival). Our goal is to count the total mentions for each artist, but the data quality is low. Here is some sample data: Rena Guinn and the Gentlemen Blackwater Railroad Company Mo' Mojo Music !! We would love to be apart of this awesome event! Amazing!!!!! The Rollin' Rust came threw at the #falldownfest last weekend 🙂 much love:) keep it up boys 🙂 Luke Hess Langhorne Slim!!!!!, Sierra Hull, First Aid Kit, Jim Lauderdale (always) We feel the data quality is too poor for basic LDA approaches (lots of misspellings, odd phrasings) and we feel a LLM would be best at least extracting the names of artists using context. We have found that ChatGPT and Claude are decent at the extraction tasks on small samples but can't handle the full input, and are next to worthless on the counting task. We've tried very specific and differnet prompts, but haven't been able to get a good result. So how should I approach this problem? I'm not sure how to break this down in to prompts or substeps. I'm not sure how to do anything of this outside of a browser, and I'm a data science novice, but willing to learn some things. Here's an example of a prompt that's not returning correct counts (off by >50% in most cases) The following is raw text comments copied from a poll. Count the total number of mentions in the poll and create a table that contains columns Band (a unique list of bands) and a column containing the total number of mentions. The table should cover the top 100 bands by total mentions. Use judgement and context to conform band names in to unique values (Example: The Town Pants, Town Pants, townpants are all the same band). Count completely and accurately. Now here is the raw data: submitted by /u/strway2heaven77 [link] [comments]  ( 10 min )
    [P] Avenues for publishing AI ethics case studies?
    I am a computer science graduate student. As part of my coursework, I am exploring the ethical issues of using Large Language Models for mental healthcare applications. I found four unique examples from the real world and outlined the ethical dilemma within them. I intend to analyze these dilemmas using various ethical frameworks in order to come up with solutions. While I am interested in getting a publication out of this work, I am unsure of the types of conferences/journals that accept case-study articles (specifically in AI ethics). Any advice from academicians over here would be greatly appreciated! submitted by /u/jwalapoet [link] [comments]  ( 9 min )
    [D] [R] Is the noise predictor in DDPMs predicting the noise added to x_0 or the noise added to x_{t-1}?
    Hi fellow computer scientists, ​ After reading the paper Improved Denoising Diffusion Probabilistic Models I got a little confused. Looking at section "2.2. Training in Practice" the authors say that: 1) "The network could also predict the noise eps added to x_0, and this noise could be used to predict x0 via..." ​ 2) "Ho et al. (2020) found that predicting eps worked best..." ​ So this left me wondering if the noise predictor is trying to compute (1) the epsilon that was added to x_0 through the close-form formula or (2) the noise added in the previous timestep to obtain x_t from x_{t-1} (i.e., eps_t or eps_{t-1}, idk...)? ​ Thank you :) submitted by /u/Christs_Elite [link] [comments]  ( 9 min )
    [P] MazeGPT - Transformer based maze generator
    Hello all, I recently did a summer research project implementing GPT-2 to generate mazes. The core concept of the model is to combine a bunch of popular maze generation algorithms into one. The goal was that the transformer will be able to identify key components using self-attention and piece together different algorithms. Most maze generation algorithms result in almost a finger print (like in chaos theory). The end goal was to mimic a higher degree of randomness / make the mazes appear less algorithmic. I'm dipping my toes into the realm of research and am looking for feedback. So far I've run the model for 5x5 mazes, it would be interesting to try training the model with varying dimensions. Feel free to join in and contribute to the project! https://github.com/noah-hein/mazeGPT 5x5 live generation https://i.redd.it/v6smbdd88gsb1.gif ​ submitted by /u/noah-hein [link] [comments]  ( 9 min )
    [D] Unable to improve binary classification problem accuracy
    I am currently working on a binary classification problem where I aim to predict whether a customer will make a purchase in the next 30 days based on their transaction history. I have a dataset of 1,000 transactions with the following features: TransactionAmount (float): The amount of the transaction. ProductCategory (categorical): Category of the product purchased (e.g., Groceries, Electronics, Books). DateOfPurchase (datetime): The date on which the transaction occurred. I've done some preprocessing and feature engineering, including normalization, one-hot encoding of categorical variables, creating interaction terms, and adding features like days since the first purchase and whether the purchase was made during the holiday season.Dataset is balanced and cleaned. I started with a base Random Forest classifier with default parameters as a starting point, but the performance is not satisfactory (accuracy = 48.5%, ROC-AUC = 0.485). I tried other models as well but was unable to improve the accuracy by more than 57%. submitted by /u/SnooTigers4634 [link] [comments]
    [D] EMNLP 2023 results
    Making a post for EMNLP 2023 results to come out today. submitted by /u/East-Beginning9987 [link] [comments]  ( 8 min )
    [P] Need help figuring out my input for anomaly detection in frequency responses
    I’ve been given a task to identify if a PCB is faulty or not based on its frequency response. I don’t have labeled data. The data I have are various gain values calculated over frequencies, so my data looks something similar to the table below. PCB | Frequency | G1 | G2 PCB 1 | 1Hz | 0.1 | 1 PCB 1 | 2Hz | 0.2 | 2 PCB2 | 1Hz | 0.3 | 3 PCB2 | 2Hz | 0.4| 4 Each PCB has several G parameters measurements taken over the same set of frequencies. I need to use an auto encoder to identify outliers and I need help in deciding how my feature matrix should look like. For example, let us consider only one data point that is PCB 1, then would a matrix like this make sense? [[ 0.1 0.2 ] - 1st row is all G1 values [1 2]] - 2nd row is all G2 values Similarly the matrix for the other PCBs are also created. I have not included frequency in my feature set because these G parameters have been measured for the same set of frequencies for all PCBs. Is this correct ? Additionally, are there any resources someone can point me to related to finding anomalies in frequency response data ? I am struggling with using the keywords while googling. submitted by /u/Savage_Garbage [link] [comments]
    [R] Towards Monosemanticity: Decomposing Language Models With Dictionary Learning. From Anthropic. "We demonstrate a method for decomposing groups of neurons into interpretable features [...]".
    Paper. I am not affiliated with this paper or its authors. Twitter thread (Nitter alternative for those who want to see the entire thread without being logged into Twitter). Related work: Sparse Autoencoders Find Highly Interpretable Features in Language Models. submitted by /u/Wiskkey [link] [comments]  ( 9 min )
    [R] Meta researchers present method for decoding speech from brain waves
    Researchers at Meta trained a deep learning model on brain recordings and audio data from 169 people listening to speech. Their method achieves up to 73% accuracy at identifying a 3-second clip of speech from non-invasive EEG or MEG scans. This is a massive improvement over previous attempts at decoding speech from neural signals. It approaches the performance of studies using implanted electrodes. The key innovations: A contrastive loss function that aligns latent speech and brain representations Leveraging pretrained speech models like wav2vec 2.0 Training one model on multiple subjects with individual tuning Being able to decode speech intention from brainwaves could one day help restore communication for patients suffering from strokes, ALS, etc. There's still a ways to go before this becomes a medical reality. Performance needs to improve and be validated during speech production rather than just passive listening. And the accuracy isn't high enough for natural conversations. But this is a hugely promising step toward brain-computer interfaces. Really interesting work at the intersection of neuroscience and AI! TLDR: New model achieves up to 73% accuracy decoding speech directly from non-invasive brain scans. Could eventually help patients with neurological conditions communicate just by thinking. Full summary here. Paper is here submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] EMNLP 2023 Notification
    Discussion thread for EMNLP 2023 notifications which will be released in a few hours along with GEM workshop. Best of luck to everyone. submitted by /u/EDEN1998 [link] [comments]  ( 9 min )
    [D] ordinal or nominal variable?
    Hey all, I am working with stock market data and scratching my head if certain variables are ordinal and can be left as is or if it is nominal and should be one-hot encoded. One of the variables in question consists of the direction of the market over a certain time. It has three categories: up, down, sideways. hope was to code them as 1, -1 and 0 respectively and treat as ordinal. There appears to be some order/relationship between them but not sure if it is enough. Is this the correct approach or should it be one-hot encoded? submitted by /u/Fishpo0 [link] [comments]  ( 9 min )
    [D] Deep Learning online course using PyTorch
    I've been out of the deep learning space for a while now and I'd like to take an online course, or set of courses, to get myself back up to speed on the latest techniques, architectures, and how to use them. I think the DeepLearning.ai specialization through Coursera is a good match, but I see that it uses Tensorflow. Is there any course like this that would use PyTorch? Or would the transition not be too hard once the fundamentals are in place? Thanks! submitted by /u/ComicFoil [link] [comments]
    Fine Tuning or RAG for Coding [D]
    Need some help what is the best way to start. Pls Advice ! I have a specific code in my repos (lets say .net + JS). The goal is to have prompt based code adjustments to existing repos (like very focused copilot) . Either using single agent or using something like AutoGen. So let say I have thousands of files with code and some descriptions about code functionality (spec) . I want either to generate code based on next spec and I want newly generated code to be similar in style to what is in my repos. So now questions: Should I vectorize my code (What is best way to do that ?) or try to fine tune some model ? Give me your ideas / experience in code generation based on previous code. submitted by /u/mcwin1 [link] [comments]  ( 9 min )
    [Project] I built an open-source scraping API that returns structured JSON data using GPT.
    I decided to open-source my own web scraping API that I'm using to get information from different websites without using any selectors or XPath. Just provide the URL and a desired JSON schema, and it will return extracted data. Hope this can be helpful for someone. Cheers! https://github.com/semanser/JsonGenius https://preview.redd.it/icq1i8slvesb1.png?width=4096&format=png&auto=webp&s=ac86ccdb3da5ef1ffa86e3473619162f6b652ac6 submitted by /u/semanser [link] [comments]  ( 9 min )
    [R] Is self-correction a viable method to improve LLM reasoning? Probably not.
    Can LLMs actually improve their own reasoning by self-correcting mistakes? A new paper from DeepMind and the University of Illinois looks to answer this quantitatively. The results show that unaided, LLMs struggle at self-correction for reasoning tasks. The core issue is LLMs have trouble reliably evaluating the correctness of their own responses. They rarely identify flaws in initial reasoning. Sometimes LLMs even alter initially correct responses to become incorrect after self-correction! (I've personally seen this when interacting with ChatGPT many times and you probably have too). More complex techniques like critiquing between LLM instances don't help much either. External feedback or guidance looks necessary to improve reasoning (Well, some interesting parallels to this paper here about implicit improvement from preference data vs traditional RLHF). Self-correction does show promise for things like making responses more polite or safe though. Criteria there are more clear-cut. The authors argue we need to balance enthusiasm with realistic expectations on self-correction. It has a lot of limits for improving reasoning (at least with current models). But they suggest promising directions like incorporating high-quality external feedback from humans, training data, and tools. That could be key to unlocking self-correction's potential down the road. TLDR: Basically title... LLMs can't reliably self-correct reasoning yet. Maybe hybrid techniques combining self-correction with external guidance could work but we need more research. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [P] NIDDK-CR Data Centric Challenge: Enhancing NIDDK datasets for future artificial intelligence applications
    Calling all AI researchers! Using data aggregation, harmonization, fusion, and other data enhancement methods, you can help the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) enhance the utility of NIDDK datasets for AI applications. The goal of the NIDDK Data Centric Challenge will be to generate an “AI-ready” dataset that can be used for future data challenges, using data on Type 1 Diabetes available through the NIDDK Central Repository. Register today! https://www.challenge.gov/?challenge=niddk-central-repository-data-centric-challenge submitted by /u/DataCentricChallenge [link] [comments]
    [D] off-topic, is Meta Llama 2 license agreement safe to sign for commercial use ?
    in the Meta Llama 2 license agreement (that can be found here), there is a section of "Prohibited Uses" that clearly states several use cases that the signer must accept upon himself, but several of them state the word "facilitate", as far as i can understand, if we use Llama 2 as part of a commercial product, and some end-user will use the product in malicious way (say cause the chat-bot to write the recipe of mustard gas) then this could be considered that the creator of the product is facilitating the end-user, ​ so my questions are: do you think this is a fair interpretation of the agreement ? does that mean the creator is liable to whatever the model spit out ? is there a way to censor the model (short of retraining a new model, or fine-tune on a large scale) ? is there an open source model that already gone through the process, and more safe for commercial use ? ​ https://preview.redd.it/3zo3tm4e8esb1.png?width=1197&format=png&auto=webp&s=8aa522183f82ba8f85edb69cbaabd93262efd516 ​ as per @gentlecucumber advice, i also posted it on r/legaladvice: https://www.reddit.com/r/legaladvice/comments/170ll2t/d_is_meta_llama_2_license_agreement_safe_to_sign/?utm_source=share&utm_medium=web2x&context=3 submitted by /u/Particular_Flower_12 [link] [comments]
    [D] TesseractOCR vs PaddleOCR vs EasyOCR for Japanese text extraction
    Which would be the best OCR toolkit to invest the effort to learning and building a pipeline for an OCR system that will be used to extract Japanese text? I tried Tesseract initially and although I got some good results, I found it hard to do finetuning due to messy and outdated documentation. I haven't had the time to look at the other two OCR tools yet but if anyone had any experience, please do share them especially with how easy or difficult is the finetuning process as well as deploying the tuned models. submitted by /u/Spitfire_ex [link] [comments]  ( 9 min )
    [D] Adapting OpenSource GPT Models - requirements/possibilities?
    Hi, our company plans for some budget in 2024 to invest into hardware to do the following - running local LLMs for our coworkers to interfere with an locally running offline GPT alike ChatGPT. Use cases: generating templates for email, letters etc Translation (EN/GER/FR/SPA) Querying internal knowledge bases and/or FAQs/HOWTOs I did some research but it is still hard for me to estimate what are the HW / AI skill requirements to implement something not a quarter as good as ChatGPT. Ive played with Nomics gpt4all which comes close to a baseline. We cant use cloud services due to our data privacy policy, so I checked on what would be a good starting point to invest into hardware. I came up with a gamer PC (octacore Intel i9/AMD Ryzen 7) utilizing NVidia RTX 4090 (24Gb) / Radeon RX 7900 / 2TB SSD / 64Gb RAM for approximately 3600 Eur. I am pretty sure that would be sufficient to host a decent LLM serving simultaneous client requests. But is there also a way to adapt / process our companies data? Most sources state that proper LLMs were trained using hundreds of NVidia A100 and thousands of CPUs. On the other hand we would be fine with just fine-tuning a pretrained model. Could you please point me to some sources to learn more about possibilities and requirements as to be able to make well-informed investment decisions? Also, we probably lack the required skills, and would be interested to learn if there are companies and/or projects assisting with this kind of task? thanks submitted by /u/EatTFM [link] [comments]  ( 9 min )
    [D] - Are LoRAs able to improve results on reasoning benchmarks or is full-parameter fine tuning required?
    Is there any good research on which benchmarks LoRAs are most effective at impacting, or are they relegated mostly to changing the style of an LLM's response? submitted by /u/30299578815310 [link] [comments]  ( 9 min )
    [D] How to test if regression model is statistically significantly better, including its test error?
    I have a regression model, predicting a popularity of a text. I have its performance metrics on test set, e.g. RMSE and MAE. This gives me an uncertainty estimate about its predictions. Now I want to transform the text in some way, e.g. give it to human experts or another model to "upgrade" (in terms of getting better popularity). So I have the original and transformed text. Now I have 3 popularity scores: true popularity for original text predicted popularity for original text predicted popularity for transformed text Obviously, if model MAE is for example around 5, and predicted popularity for transformed text is higher than for the original by 1.5, this can be totally random, due to errors in the model prediction. How can I measure if text transformation is beneficial, i.e. statistically significantly better than the original text, incorporating information about model quality? Requiring that the improvement has to be higher than model error would be incredibly strict. submitted by /u/qalis [link] [comments]
    [D] David Donoho: Data Science at the Singularity (pushback on AGI singularity, advocates for Open Science and reproducibility)
    submitted by /u/wojcech [link] [comments]  ( 9 min )
    [R] Tensor Programs VI: Feature Learning in Infinite-Depth Neural Networks
    Paper: https://arxiv.org/abs/2310.02244 Abstract: By classifying infinite-width neural networks and identifying the optimal limit, Tensor Programs IV and V demonstrated a universal way, called μP, for widthwise hyperparameter transfer, i.e., predicting optimal hyperparameters of wide neural networks from narrow ones. Here we investigate the analogous classification for depthwise parametrizations of deep residual networks (resnets). We classify depthwise parametrizations of block multiplier and learning rate by their infinite-width-then-depth limits. In resnets where each block has only one layer, we identify a unique optimal parametrization, called Depth-μP that extends μP and show empirically it admits depthwise hyperparameter transfer. We identify feature diversity as a crucial factor in deep networks, and Depth-μP can be characterized as maximizing both feature learning and feature diversity. Exploiting this, we find that absolute value, among all homogeneous nonlinearities, maximizes feature diversity and indeed empirically leads to significantly better performance. However, if each block is deeper (such as modern transformers), then we find fundamental limitations in all possible infinite-depth limits of such parametrizations, which we illustrate both theoretically and empirically on simple networks as well as Megatron transformer trained on Common Crawl. Interesting, great to see this line of work continued, muP was great, now Depth-muP submitted by /u/_puhsu [link] [comments]  ( 9 min )
  • Open

    Generative AI megatrends: Gen AI start-up ecosystem
    One of my students asked me: “Which is the best area/s for Gen AI start-ups?” This is not an easy question – mainly due to the dynamic nature of AI, but here are two reference points. The first is a Generative AI Tools Landscape from DataCamp. This gives both the categories and the subcategories for… Read More »Generative AI megatrends: Gen AI start-up ecosystem The post Generative AI megatrends: Gen AI start-up ecosystem appeared first on Data Science Central.  ( 19 min )
  • Open

    AI: Voice cloning tech emerges in Sudan civil war
    A campaign using AI voice cloning technology to impersonate Omar al-Bashir, the former leader of Sudan, has gained attention on TikTok. The anonymous account has been posting what it claims are 'leaked recordings' of the ex-president, despite Bashir not being seen in public for a year and being believed to be seriously ill. Experts warn that campaigns like this demonstrate how new tools can quickly and cheaply distribute fake content through social media. The authenticity of the recordings has been questioned, but evidence suggests that voice conversion software has been used to mimic Bashir's voice. TikTok has taken down the account, stating that it violated their guidelines on posting false content and the use of synthetic media. Source : https://www.bbc.co.uk/news/world-africa-66987869 submitted by /u/NuseAI [link] [comments]
    When AI tells you what you want to hear, even if it knows it's not true ... A Bard example
    I love Bard. It eloquently tells me things in a way that meets and exceeds my expectations, and even more than GPT-4. But what is Google's strategy behind programming it to say things that it knows are not true? Do they train it to say what the user wants to hear? If so, that's disconcerting to say the least, although could be a brilliant way to comb users for enhancement requests. Here is my most recent experience, a humorous example which I feel illustrates the point perfectly: Me: Bard, can you take a .wav audio file and create a phoneme transcript using the IPA symbol library? Bard: Yes, I can work from a .wav audio file to transcribe your speech into IPA phonemes. You can upload the file to a cloud storage service such as Google Drive. Me: Great. Here's the link to the file: <link…
    Just created /r/thisiswhyaiwasmade for the more silly side of AI generated content
    Join us! /r/thisiswhyaiwasmade is where we have fun and celebrate the enjoyable and entertaining side of the tech. Need a place to put your silly and playful prompts, AI generated content, ideas for ways the bend the rules of AI? This is the place! submitted by /u/unknowingafford [link] [comments]
    CGPT-4, aside from solving climate change, what is the most positive thing that AI can do for the world?
    In the context of maximizing positive impact, AI has transformative potential in healthcare. Let's start with personalized medicine. AI can analyze vast data sets, consisting of genetic, environmental, and lifestyle variables, to tailor medical treatments for individuals. This level of personalization could substantially increase treatment effectiveness and reduce adverse side effects. We're not just talking about better pills; we're envisioning targeted gene therapies and custom-tailored interventions that could be more curative rather than merely palliative. Beyond personalization, AI can democratize healthcare access. Remote areas without specialists could have AI systems that diagnose and recommend treatments, making quality healthcare accessible without the need for extensive travel. Telemedicine could be supercharged by AI, offering medical consultations and second opinions to anyone with internet access. Moreover, AI could lead breakthroughs in drug discovery and epidemiology. It could analyze complex biochemical interactions at a speed unattainable by humans, slashing the time and cost to bring new drugs to market. On a larger scale, AI could monitor and model the spread of diseases, providing actionable insights for containment and treatment strategies. So, AI in healthcare is not merely an incremental improvement. It's a paradigm shift that could equalize healthcare access and significantly extend human life while improving its quality. All these advancements could happen within our lifetime, changing the face of medicine and, by extension, society. submitted by /u/Georgeo57 [link] [comments]
    What's the difference between a human's brain and AI?
    Functioning. Humans use the brain's computing power, memory, and ability to think, whereas AI-powered machines rely on data and specific instructions fed into the system. Besides, it takes a very long time for humans to process and understand the problems and gets accustomed to them. submitted by /u/Virtual-Study-Campus [link] [comments]
    6 AI Apocalypse Scenarios And Why They're Wrong
    submitted by /u/arrowoftime [link] [comments]
    How to use custom instructions for ChatGPT like a Pro (Ultimate Guide for 2023)
    submitted by /u/Senior_tasteey [link] [comments]
    DeepMind cofounder is tired of ‘knee-jerk bad takes’ about AI
    Mustafa Suleyman, the cofounder of DeepMind and CEO of Inflection AI, discusses his concerns about AI risks and the need for precaution. He believes that while some extreme scenarios may be over the top, it's important to treat powerful technologies with caution. Suleyman highlights the middle layer of AI risks that people often underestimate, which involves the amplification of goals for both good and bad actors. He emphasizes the need to contain AI to prevent potential negative consequences. Suleyman talks about the balance between risks and opportunities in technology and the importance of considering both aspects. He mentions the hype around generative AI and the need to look beyond the surface to understand its true potential. Suleyman discusses the discussions with lawmakers about AI and the challenge of bridging the gap between policy makers and tech experts. Source : https://venturebeat.com/ai/deepmind-cofounder-is-tired-of-knee-jerk-bad-takes-about-ai/ submitted by /u/NuseAI [link] [comments]
    Does Sam Altman Know What He’s Creating?
    submitted by /u/norcalnatv [link] [comments]
    DeepMind, Univ. of Illinois: Is self-correction a viable method to improve LLM reasoning? Probably not.
    Can LLMs actually improve their own reasoning by self-correcting mistakes? A new paper from DeepMind and the University of Illinois looks to answer this quantitatively. The results show that unaided, LLMs struggle at self-correction for reasoning tasks. The core issue is LLMs have trouble reliably evaluating the correctness of their own responses. They rarely identify flaws in initial reasoning. Sometimes LLMs even alter initially correct responses to become incorrect after self-correction! (I've personally seen this when interacting with ChatGPT many times and you probably have too). More complex techniques like critiquing between LLM instances don't help much either. External feedback or guidance looks necessary to improve reasoning (Well, some interesting parallels to this paper here about implicit improvement from preference data vs traditional RLHF). Self-correction does show promise for things like making responses more polite or safe though. Criteria there are more clear-cut. The authors argue we need to balance enthusiasm with realistic expectations on self-correction. It has a lot of limits for improving reasoning (at least with current models). But they suggest promising directions like incorporating high-quality external feedback from humans, training data, and tools. That could be key to unlocking self-correction's potential down the road. TLDR: Basically title... LLMs can't reliably self-correct reasoning yet. Maybe hybrid techniques combining self-correction with external guidance could work but we need more research. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]
    I need help finding a tool
    Buddy no of the tool where I can take an image have an AI translated and replace the text with the same style and have it in the new language like for example translating a Japanese image to English and have it look exactly the same just in English I'm looking for a free one that doesn't require credits it can be a desktop app or a website doesn't matter just needs to be free submitted by /u/agentduckman12 [link] [comments]
    How much do I have to edit AI generated images to become my own IP?
    Hey there! I'm a 1-man card game designer and while juggling the project as well as mt senior year of college, I have been relying heavily on AI-generated artwork to speed up my workflow with some illustrations and other forms of world-building. In regards to the recent legal decisions (in the US), in which any work produced by AI cannot be copyrighted, how much do I need to change the illustrations to become my own, if I even can at all? Thanks! Edit for clarity: I am also an illustrator. So this question comes from the perspective of an artist trying to save time and energy for other projects submitted by /u/Luke192 [link] [comments]
    Comparative Evaluation of Fine-Tuned and Standard Language Models in Emulating Living Historical Figures: A Detailed Study Proposal
    submitted by /u/alcanthro [link] [comments]
    JPMorgan CEO Jamie Dimon: AI will lead to 3.5-day workweek | Fortune
    Jamie Dimon says the next generation of employees will work 3.5 days a week and live to 100 years old submitted by /u/AminoOxi [link] [comments]
    Google unveils Pixel 8 built for 'the generative AI era' | CNN Business
    submitted by /u/pehnsus [link] [comments]
  • Open

    Improve prediction quality in custom classification models with Amazon Comprehend
    In this post, we explain how to build and optimize a custom classification model using Amazon Comprehend. We demonstrate this using an Amazon Comprehend custom classification to build a multi-label custom classification model, and provide guidelines on how to prepare the training dataset and tune the model to meet performance metrics such as accuracy, precision, recall, and F1 score.  ( 8 min )
    Fast and cost-effective LLaMA 2 fine-tuning with AWS Trainium
    Large language models (LLMs) have captured the imagination and attention of developers, scientists, technologists, entrepreneurs, and executives across several industries. These models can be used for question answering, summarization, translation, and more in applications such as conversational agents for customer support, content creation for marketing, and coding assistants. Recently, Meta released Llama 2 for both […]  ( 7 min )
  • Open

    New tools are available to help reduce the energy that AI models devour
    Amid the race to make AI bigger and better, Lincoln Laboratory is developing ways to reduce power, train efficiently, and make energy use transparent.  ( 11 min )
  • Open

    OpenAI's justification for why training data is fair use, not infringement [pdf]
    submitted by /u/nickb [link] [comments]
    Traveling Words: A Geometric Interpretation of Transformers
    submitted by /u/nickb [link] [comments]
  • Open

    Tangent sum
    When I was writing my post on lemniscate functions yesterday, a line from the Wikipedia article seemed familiar for reasons I cannot place. Defining a tangent-sum operator as a ⊕ b := tan(arctan ⁡ a + arctan ⁡ b) gives cl² z ⊕ sl² z = 1. I feel like I’ve seen this tangent-sum used before, but […] Tangent sum first appeared on John D. Cook.  ( 6 min )
    Enriched categories
    We begin with a couple examples. First, the set of linear transformations from one vector space to another is itself a vector space. Second, the set of continuous linear operators from one Banach space to another is itself a Banach space. Or maybe better, this set can be made into a Banach space. In the […] Enriched categories first appeared on John D. Cook.  ( 6 min )
    p-norm trig functions and “squigonometry”
    This is the fourth post in a series on generalizations of sine and cosine. The first post looked at defining sine as the inverse of the inverse sine. The reason for this unusual approach is that the inverse sine is given in terms of an arc length and an integral. We can generalize sine by […] p-norm trig functions and “squigonometry” first appeared on John D. Cook.  ( 5 min )
    Geometric derivation of hyperbolic trig functions
    This is the third post in a series on generalizing sine and cosine. The previous post looked at a generalization of the sine and cosine functions that come from replacing a circle with a lemniscate, a curve that looks like a figure eight. This post looks at replacing the circle with a hyperbola. On the […] Geometric derivation of hyperbolic trig functions first appeared on John D. Cook.  ( 5 min )
  • Open

    HoloAssist: A multimodal dataset for next-gen AI copilots for the physical world
    HoloAssist is a new multimodal dataset consisting of 166 hours of interactive task executions with 222 participants. Discover how it offers invaluable data to advance the capabilities of next-gen AI copilots for real-world tasks. The post HoloAssist: A multimodal dataset for next-gen AI copilots for the physical world appeared first on Microsoft Research.  ( 10 min )
    Intern Insights: Dr. Madeleine Daepp with Jennifer Scurrell and Alejandro Cuevas
    Connecting with researchers, collaborating across disciplines, and exploring a new city—PhD students Jennifer Scurrell and Alejandro Cuevas talk to Senior Researcher Madeleine Daepp about the internship experience at Microsoft Research. The post Intern Insights: Dr. Madeleine Daepp with Jennifer Scurrell and Alejandro Cuevas appeared first on Microsoft Research.  ( 29 min )
  • Open

    Brains of the Operation: Atlas Meditech Maps Future of Surgery With AI, Digital Twins
    Just as athletes train for a game or actors rehearse for a performance, surgeons prepare ahead of an operation. Now, Atlas Meditech is letting brain surgeons experience a new level of realism in their pre-surgery preparation with AI and physically accurate simulations. Atlas Meditech, a brain-surgery intelligence platform, is adopting tools — including the MONAI Read article >  ( 7 min )
    Fall in Line for October With Nearly 60 New Games, Including Latest Game Pass Titles to Join the Cloud
    October brings more than falling leaves and pumpkin spice lattes for GeForce NOW members. Get ready for nearly 60 new games to stream, including Forza Motorsport and 16 more PC Game Pass titles. Assassin’s Creed Mirage leads 29 new games to hit the GeForce NOW library this week. In addition, catch a challenge to earn Read article >  ( 9 min )

  • Open

    Ring Attention with Blockwise Transformers for Near-Infinite Context
    submitted by /u/nickb [link] [comments]
    Think before you speak: Training Language Models With Pause Tokens
    submitted by /u/nickb [link] [comments]
    Towards Self-Assembling Artificial Neural Networks through Neural Developmental Programs
    submitted by /u/nickb [link] [comments]
    AI has been reading my mind.
    I know several people that tell me whenever they say something out loud, they start seeing it advertised to them or on their feed. But for me, if I think of certain things, even if I never said it out loud, it will appear on my feed.. has anything similar been happening to anyone else? submitted by /u/GuaranteedBigBoy [link] [comments]
  • Open

    [P] Open-source project to run locally LLMs in browser, such as Phi-1.5 for fully private inference
    Excited to introduce BlindChat (https://github.com/mithril-security/blind_chat), an open-source, privacy-centric alternative to ChatGPT for in-browser Conversational AI! We provide full local inference in browser, by using libraries from Hugging Face like transformers.js or candle for WASM inference. We have supported several small models, the latest one being Phi-1.5, the 1.3B model that beat Llama 2 7b! As Microsoft’s researchers mentioned in their paper, the model often produces incorrect code and statements. They are just suggestions, and this model is not trained for instruction tuning, so it might be harder to use than regular chat. More info on their model card (https://huggingface.co/microsoft/phi-1_5). We would love to have your feedback on our project, as we are aiming to build a privacy-first and open-source alternative to ChatGPT! submitted by /u/Separate-Still3770 [link] [comments]  ( 9 min )
    [D] What is the relation between learning rate and vanishing gradient problem?
    How can we tackle vanishing gradient problem by changing the learning rate? Is it possible? submitted by /u/InternationalBack472 [link] [comments]  ( 9 min )
    [P] Torchsummary not working with your layers again? Try this lightweight alternative
    pip install output-shape It is a minimalistic and simple alternative to torchsummary with a simple print of the output shape of a layer, or custom layer. For torch.nn.MultiheadAttention, it handles both the output shape and the attn matrix separately. https://github.com/avocardio/output-shape Currently only works with PyTorch models, soon with Tensorflow / Keras as well. Jax is also on the list for later! submitted by /u/capital-man [link] [comments]  ( 9 min )
    [D] Thoughts on current Vector DB landscape?
    Hello, What are your thoughts on current Vector DB offerings? For instance: Do you think the pricing for them is reasonable/viable? Do you think there’s a sufficient level of developer/user experience? What about for those who aren’t necessarily specialized in data? If you like a managed service, why do you prefer it over the open source alternatives? submitted by /u/LucasSaysHello [link] [comments]  ( 9 min )
    [R] NeuRBF: A Neural Fields Representation with Adaptive Radial Basis Functions
    Project Page Paper Code We present a novel type of neural fields that uses general radial bases for signal representation. State-of-the-art neural fields typically rely on grid-based representations for storing local neural features and N-dimensional linear kernels for interpolating features at continuous query points. The spatial positions of their neural features are fixed on grid nodes and cannot well adapt to target signals. Our method instead builds upon general radial bases with flexible kernel position and shape, which have higher spatial adaptivity and can more closely fit target signals. To further improve the channel-wise capacity of radial basis functions, we propose to compose them with multi-frequency sinusoid functions. This technique extends a radial basis to multiple Fourier radial bases of different frequency bands without requiring extra parameters, facilitating the representation of details. Moreover, by marrying adaptive radial bases with grid-based ones, our hybrid combination inherits both adaptivity and interpolation smoothness. We carefully designed weighting schemes to let radial bases adapt to different types of signals effectively. Our experiments on 2D image and 3D signed distance field representation demonstrate the higher accuracy and compactness of our method than prior arts. When applied to neural radiance field reconstruction, our method achieves state-of-the-art rendering quality, with small model size and comparable training speed. submitted by /u/Sirisian [link] [comments]  ( 9 min )
    [D]
    Hi guys ! I am going to purchase a laptop for programming and AI tasks. I will be working on a simulation software project related to the trajectory of an object in 2d and 3d space. Which laptop will be the most suitable for these tasks and it should have high battery backup because the place where I work does not have enough power sockets. The first laptop which came into my mind was Macbook pro with M2 pro chip and Lenovo Thinkpad X1 Carbon gen 10. Suggest me the best. submitted by /u/smitherium [link] [comments]  ( 9 min )
    [Discussion] Feature Selection Algorithms
    I have only 200 samples but about 30 features. What are some effective commonly used feature selection algorithms? I want to identify the features that play the most significant role in generating outcomes. submitted by /u/Shina-pig [link] [comments]  ( 9 min )
    [R] Will a small error be determining in the final decision for my paper?
    About a week ago, I submitted my first paper into one of the most prestigious Machine Learning conferences out there. This was a last minute submission, and my supervisor and I were working on it simultaneously until the very last moment. Sadly, my supervisor committed an error when writing the mathematical definition of a certain set, slightly changing its meaning. This change, even though small, changes the definition in such a way that the subsequent theorem and its proof isn't formally correct anymore, as it assumes the original definition of the set, not the new one. How much will this affect the decision of accepting or rejecting my paper? The whole method, results and consequences are still the same, no matter this definition. It's more a problem of a "formal" nature (here "formal" as a word in the mathematical sense). Is there a other way that I can inform about this error without changing the content maybe? I know that at some point, they give a chance to edit the original paper, but I don't know if this is after the decision to accept/reject. submitted by /u/howtorewriteaname [link] [comments]  ( 9 min )
    How can I apply object detection and image segmentation functionality to my current custom-trained Image Classification model? [D]
    So, a few months ago, I started developing this deep learning model, which was made purely to differentiate whether the input image is driftwood floating in water or a crocodile. To my knowledge, I leveraged the resnet50 pre-trained SoTA model to train my deep learning model, and for that, I downloaded almost 5k images of driftwood and crocodiles for my model training. Once the training was complete, I took the next step and deployed my model on the Hugging Face Spaces app, allowing my friends to put it to the test. But here's where I ran into a significant challenge: users could even upload their own selfies, and my model would attempt to predict whether they were a crocodile or a piece of driftwood! So how can I leverage object detection or the image segmentation pipeline so that when the user inputs their image, it tries to detect the object from the photo and then detect whether the detected object from the given image contains a crocodile or not? If the crocodile or driftwood is not found then it should return "No object found" or like that. submitted by /u/meWhoObserves [link] [comments]  ( 9 min )
    [R] Large Language Models Represents Space and Time
    Paper - https://arxiv.org/abs/2310.02207 submitted by /u/MysteryInc152 [link] [comments]  ( 8 min )
    [R] Help Shape the Future of Machine Learning: Take Our Short Survey and Let's Create Something Amazing Together!
    Hello Redditors in r/MachineLearning We are the team behind ML Workbench, an upcoming integrated platform designed to streamline your entire machine learning lifecycle. From data preprocessing and model training to validation and deployment, we aim to make the process as seamless as possible. But here's the thing: we need your insights to build something that truly resonates with the community and solves real-world problems. 📝 Click Here to Take the Survey Why Should You Care? Unified Experience: Imagine managing all your ML tasks in one integrated environment. High-Performance Computing: We're leveraging powerful A100 GPUs to accelerate your work. User-Centric Design: Whether you're a beginner or a pro, the platform is designed to cater to all skill levels. Collaboration: Built-in features to make team collaboration effortless. What's in the Survey? The survey contains questions about your current challenges, the tools you use, and what you'd love to see in an ML platform. It should only take about 5-10 minutes to complete. Thank You Gift As a small token of our appreciation, we're offering exclusive early access to the platform for selected participants. Don't miss this chance to be among the first to experience what we're building! 📝 Click Here to Take the Survey Your feedback is crucial for us to create a tool that we hope will make a significant positive impact in the machine learning community. Thank you for taking the time to read this post and participate in our survey. Cheers, The ML Workbench Team submitted by /u/nonononottodayorever [link] [comments]  ( 9 min )
    [P] Video Event Detection
    Hi, I'm looking to create a model that given a sequence of frames from a video, returns a probability distribution over a set of events that may have occurred in those frames (probably 5 - 10 events). The training data will consist of video and hand labelled frame index/event pairs. I'm not too concerned about handling simultaneous events. It would be super helpful for some suggestions on a model architecture that would yield the best results and/or good papers/examples that achieve something similar. Thanks! submitted by /u/Dredgefort [link] [comments]  ( 9 min )
    [P] Retrieval augmented generation with OpenSearch and reranking [Video tutorial]
    I created a video tutorial that tries to demonstrate that semantic search (using embeddings) is not always necessary for RAG (retrieval augmented generation). It was inspired by the following Cohere blog post: https://txt.cohere.com/rerank/ I code up a minimal RAG pipeline: OpenSearch -> Rerank -> Chat completion (without using Langchain or similar libraries) and then see how it performs on various queries. Hope some of you find it helpful. Feel free to share any feedback@ Video link: https://youtu.be/OsE7YcDcPz0 submitted by /u/mildlyoverfitted [link] [comments]  ( 9 min )
    [R] Hacking an NLP benchmark: How to score 100 points on AMR parsing
    AMR parsing is a fun task where researchers map texts onto little graphs that explicate their meaning, so called Abstract Meaning Representations (AMRs). While arguably not the top NLP benchmark regarding popularity, research has been active for the last 10 years, including at major NLP conferences such as ACL/NAACL/EACL/EMNLP etc. Funnily, I recently found some vulnerabilities in the evaluation protocol, and if we exploit these vulnerabilities, we can get the highest score on the benchmark. To get an overview over the issue (without understanding AMR), imagine a cooking contest that takes place regularly, say, once a year. In all events, we have the same judge, participants are amateurs, meals are scored on 0 to 100, with 100 meaning “it can’t possibly get better”. Over the years, the …  ( 10 min )
    [D] Looking for an article related to machine learning in medicine to be presented at a journal club
    Hi all, I'm curious if anyone has a stand-out article they believe would prompt a lively discussion in a journal club I have coming up. Something that may have people take sides, or maybe a recent breakthrough in the ML space as it relates to clinical/health care. ​ Thanks! submitted by /u/veilofosiris [link] [comments]  ( 9 min )
    [R] Think before you speak: Training Language Models With Pause Tokens
    Paper - https://arxiv.org/abs/2310.02226 submitted by /u/MysteryInc152 [link] [comments]  ( 8 min )
    [P] Good models to use for multimodal object detection when both the modalities are image based or some object detection models which support ensembling out of the box like Yolov5?
    So basically I have a dataset with images of vehicles in top down view in both RGB and IR, what are some models I can use for both unimodal and multimodal object detection to compare their performance. Links to GitHub repos would be helpful. Thanks submitted by /u/Xyber5 [link] [comments]  ( 9 min )
    [P] Using pre-trained models as features?
    Hey everyone! Currently, I am working on a project around music emotion classifcation/regression model. Basically I am trying to predict a score to each emotion on a given song. The problem is that my dataset has quite imbalanced scores (y). Most scores are centered around a certain score range. Therefore, having difficulties predicting scores that are further away of the mean values. I had this idea to bring in pre-trained (on other datasets and problems) audio classification models into this as there are a bunch of good performing pre-trained classification models out there already. The prediction of these pre-trained models should be used as features (e.g. prediction of genre, instrument etc) beside the original spectorgram in my model. I know this won't solve the problem of imbalances in the scores but I thought maybe this could improve the performance as the model would have more features to work with. Does this make sense? I appreciate any input. submitted by /u/Kniggi [link] [comments]  ( 9 min )
    [D] LOMO underrated
    Does anyone have an idea why the LOMO optimizer (low memory optimizer) which was released a few months ago is not widely available and everyone still uses either Adam or SGD? While the paper looks really promising submitted by /u/RedMoula [link] [comments]  ( 9 min )
    [P] Camera based monitoring of infant's breathing
    Hi! I recently have seen systems that monitor breathing rate of an infant through camera. I have read several articles on that topic, where people used things like 3D camera, RGB or Interferometric Radar Sensor. Do you guys have any idea on how to accurately measure this? submitted by /u/kaina_m [link] [comments]  ( 9 min )
    [R] Towards Self-Assembling Artificial Neural Networks through Neural Developmental Programs
    submitted by /u/hardmaru [link] [comments]  ( 8 min )
    [D] How Do You Track Projects in a Scaling ML Team?"
    I am part of a Machine Learning team that has experienced significant growth recently. When we were a small team, tracking projects was straightforward. However, as the team has expanded, it's become increasingly challenging to keep track of everything. We are part of a larger corporation, so we have access to tools for creating epics and boards. However, these corporate tools are too generic and don't provide the level of detail I need for internal management. Specifically, I'm looking for a way to track model versions, dataset versions, and the overall status of our projects. I'd also like to be able to assign team members to projects. Currently, we use a MIRO board, but it's disorganized and difficult to read and update. I'd love to hear what tools or strategies you've used for similar situations, especially since our team is expected to grow even more, making tracking increasingly complex. submitted by /u/Spiritual_Narwhal649 [link] [comments]  ( 9 min )
  • Open

    Lemniscate functions
    In the previous post I said that you could define the inverse sine as the function that gives the arc length along a circle, then define sine to be the inverse of the inverse sine. The purpose of such a backward definition is that it generalizes to other curves besides the circle. For example, it […] Lemniscate functions first appeared on John D. Cook.  ( 5 min )
    Generalized trigonometry
    In a recent post I mentioned in passing that trigonometry can be generalized from functions associated with a circle to functions associated with other curves. This post will go into that a little further. The equation of the unit circle is and so in the first quadrant The length of an arc from (1, 0) […] Generalized trigonometry first appeared on John D. Cook.  ( 5 min )
  • Open

    A Mine-Blowing Breakthrough: Open-Ended AI Agent Voyager Autonomously Plays ‘Minecraft’
    For NVIDIA Senior AI Scientist Jim Fan, the video game Minecraft served as the “perfect primordial soup” for his research on open-ended AI agents. In the latest AI Podcast episode, host Noah Kravitz spoke with Fan on using large language models to create AI agents — specifically to create Voyager, an AI bot built with Read article >  ( 6 min )
    How AI Helps Fight Wildfires in California
    California has a new weapon against the wildfires that have devastated the state: AI. A freshly launched system powered by AI trained on NVIDIA GPUs promises to provide timely alerts to first responders across the Golden State every time a blaze ignites. The ALERTCalifornia initiative, a collaboration between California’s wildfire fighting agency CAL FIRE and Read article >  ( 6 min )
  • Open

    LLMs May Be The Trojan Horse That Modernizes Software Development
    submitted by /u/geekteam6 [link] [comments]
    Why PepsiCo is powering your snacks with AI
    Using AI to improve Cheetos? That's something PepsiCo has experimented with. On today’s POLITICO Tech, Athina Kanioura, chief strategy and transformation officer for PepsiCo, says that using AI to make employees faster and more efficient hasn’t led PepsiCo to replace human workers as many fear. And why the company has determined that in some jobs the technology is simply off limits. Listen to the interview here: https://politico-tech.simplecast.com/episodes/why-pepsico-is-powering-your-snacks-with-ai submitted by /u/smo279 [link] [comments]
    New Paper: Enabling Language Models to Implicitly Learn Self-Improvement From Data
    LLMs keep getting more capable at generating natural language. But there's always room for improving the quality and alignment of their responses. Typically this requires lots of human effort to collect more training data. So researchers are exploring ways for models to self-improve without human involvement. Many methods use prompting - giving the LLM instructions to critique and refine its responses. But coming up with comprehensive prompts is challenging. The new approach proposed, called PIT, lets models learn self-improvement implicitly from human preference data instead. It reformulates reinforcement learning to maximize the gap between an original response and improved response conditioned on the original. This taps into the implicit guidance in the preference data on what constitutes better quality, so no manual rubrics are needed. PIT uses curriculum reinforcement learning - first improving easy references, then switching to the LLM's own samples. Experiments on real and synthetic datasets show PIT significantly outperforms prompting methods like Self-Refine. It improved response quality 7-34% across conditions without any human involvement. This demonstrates a promising direction for LLMs to align better with human preferences autonomously as they learn from experience. No need for human bottlenecks when expanding to new domains or underserved use cases. Very cool! TLDR: New method PIT enables LLMs to implicitly learn to refine themselves from human preference data, no prompts needed. Big improvement over prompting approaches. Full Summary Arxiv is here: https://arxiv.org/abs/2310.00898 submitted by /u/Successful-Western27 [link] [comments]
    $5k in grants or $250k funding for AI startups. Backed by OG's
    AI Grant is offering $5k in grants or $250k in funding for AI startups. The program is backed by OG's AI Grant, an accelerator for AI startups. The grant includes an uncapped SAFE investment of $250,000 for AI-native product startups, $350,000 in Azure credits, a summit in San Francisco with advisors and founders, and various other startup benefits and credits. The program was created by Nat Friedman and Daniel Gross. Applications for Batch 3 will open in a few months, but early applications are accepted. The program is open to anyone, and it is looking for companies or projects that leverage AI models in a useful or engaging way. Source : https://aigrant.com/ submitted by /u/NuseAI [link] [comments]
    AI will teach everyone to read and write. It's already begun.
    https://www.imagineworldwide.org/ "What is Child-Directed, Tech-Enabled Learning? Children drive their own learning, at their own pace, using software that provides a complete, research-based curriculum and pedagogy. Adults play a supportive, facilitative role. The software is delivered to the learner on a tablet, without connectivity, and charged by solar power or other appropriate energy sources... With hundreds of millions of children out of school or lacking access to effective schooling, this model can provide every child, everywhere access to learning. Solutions can work without internet access or grid power. Adults play facilitative, rather than instructional, roles. The annual unit cost of the learning solution is less than $7 per child and declining. This includes hardware, software, accessories, power, shipping, and implementation support from Imagine." submitted by /u/Georgeo57 [link] [comments]
    AI is replacing customer service jobs across the globe
    Artificial intelligence (AI) is replacing customer service jobs around the world, with chatbots being used to interact directly with customers and solve problems independently. This shift is expected to have a profound effect on economies, particularly in countries like India and the Philippines where call centers provide millions of jobs. While some argue that AI will provide support to remaining call center workers and improve job satisfaction, others warn that it could lead to job losses and a need for workforce adaptation. The use of AI software tools in call centers has shown potential for improving productivity and customer satisfaction. Source : https://www.washingtonpost.com/technology/2023/10/03/ai-customer-service-jobs/ submitted by /u/NuseAI [link] [comments]
    Female-founded AI startups win just 2% of funding deals in UK
    Female-founded AI startups in the UK account for just 2% of funding deals over the past decade, according to a report by the Alan Turing Institute. When female-founded companies do secure funding, they raise an average of £1.3m per deal, compared to £8.6m raised by all-male founder teams. The report highlights the urgent need for gender balance in AI investment, as the industry is predicted to grow significantly in the coming years. Recommendations to improve gender balance include improving recruitment, monitoring investment practices, and diversifying the ecosystem. There is an increasing demand for generative AI products, with leading tech companies investing heavily. Gender diversity gaps and uneven progress rates for ethnic and racial groups are observed across investment firms. AI products have shown biases, such as passport checkers working less efficiently with darker skin and tools reinforcing gender stereotypes. In 2019, a UN agency found that assigning female genders to digital assistants like Siri and Alexa perpetuated harmful gender biases. Source : https://www.theguardian.com/technology/2023/oct/04/female-founded-ai-startups-win-just-2-of-funding-deals-in-uk submitted by /u/NuseAI [link] [comments]
    I used Riffusion (Stable Diffusion, but for music) to turn my own music into "jazz", "Radiohead", "Muse" or "Nirvana" songs, I'm amazed by the results
    submitted by /u/cI_-__-_Io [link] [comments]
    Visa Announces $100 Mn Fund for Generative AI Companies
    submitted by /u/Agitated-Spell3979 [link] [comments]
  • Open

    My Impressions (and Application) of the Heidelberg Laureate Forum 2023
    This September, I had the chance to attend the Heidelberg Laureate Forum (HLF) for the second — and probably last — time. The HLF is an incredible experince for young researchers: Mirroring the Lindau Nobel Laureate Meetings, the organizers invite laureates from math and computer science together with young researchers pursuing their undergraduate, graduate or post-doc studies. In this article, I want to share impressions and encourage students to apply next year! The post My Impressions (and Application) of the Heidelberg Laureate Forum 2023 appeared first on David Stutz.  ( 7 min )
  • Open

    Simplify medical image classification using Amazon SageMaker Canvas
    Analyzing medical images plays a crucial role in diagnosing and treating diseases. The ability to automate this process using machine learning (ML) techniques allows healthcare professionals to more quickly diagnose certain cancers, coronary diseases, and ophthalmologic conditions. However, one of the key challenges faced by clinicians and researchers in this field is the time-consuming and […]  ( 11 min )
    Create an HCLS document summarization application with Falcon using Amazon SageMaker JumpStart
    Healthcare and life sciences (HCLS) customers are adopting generative AI as a tool to get more from their data. Use cases include document summarization to help readers focus on key points of a document and transforming unstructured text into standardized formats to highlight important attributes. With unique data formats and strict regulatory requirements, customers are […]  ( 9 min )
    Automate prior authorization using CRD with CDS Hooks and AWS HealthLake
    Prior authorization is a crucial process in healthcare that involves the approval of medical treatments or procedures before they are carried out. This process is necessary to ensure that patients receive the right care and that healthcare providers are following the correct procedures. However, prior authorization can be a time-consuming and complex process that requires […]  ( 7 min )
  • Open

    Scalable spherical CNNs for scientific applications
    Posted by Carlos Esteves and Ameesh Makadia, Research Scientists, Google Research, Athena Team Typical deep learning models for computer vision, like convolutional neural networks (CNNs) and vision transformers (ViT), process signals assuming planar (flat) spaces. For example, digital images are represented as a grid of pixels on a plane. However, this type of data makes up only a fraction of the data we encounter in scientific applications. Variables sampled from the Earth's atmosphere, like temperature and humidity, are naturally represented on the sphere. Some kinds of cosmological data and panoramic photos are also spherical signals, and are better treated as such. Using methods designed for planar images to process spherical signals is problematic for a couple of reasons. Firs…  ( 92 min )
  • Open

    Why DQN method is only suitable for small discrete action space? What is the issue if action space is large and continous?
    submitted by /u/aabra__ka__daabra [link] [comments]
    Up to date Metaworld documentation
    Hello everyone, I want to start experimenting with the domain of multi-tasking and meta-learning, thus I pip installed metaworld which is currently on version 2.0.0 if I'm not mistaken. I wanted to ask in case anybody knows, if there's any recent updated documentation, because the farama foundation on GIthub which is probably responsible for maintaining the Metaworld, has outdated code and documentation. (for example, presented code on Github's README has the command env.step(a) which returns 4 values instead of 5 that newer version outputs). From what I understand, they gather contributors for a big push regarding code and documentation on GItHub, where they will make up things up to date again but this announcement was 7 months ago. Sorry for the potentially wrong format of this question-post, I'm relatively new to reddit. I would appreciate any further knowledge on this topic and thanks everyone who's taking the time to read it! ​ Metaworld Distribution from Farama Foundation on Github: https://github.com/Farama-Foundation/Metaworld submitted by /u/South_Book_5625 [link] [comments]
    The future of game testing is here, and it is powered by Artificial Intelligence! 🔥
    Hi everyone! We used our opensource library SheepRL 🐑 and our PyTorch implementation of DreamerV3 on Crafter, an open-world survival game, featuring randomly generated 2D worlds, in which players have the freedom to explore a large and expansive map and need to forage for food, collect materials, build tools and find shelter. Here is a short video 👉 https://youtu.be/7XEBT2msUUQ In open-world games, ensuring they are playable and bug-free is crucial, but is becoming increasingly difficult and time-consuming using manual game testing. Maximizing exploration using Reinforcement Learning is extremely useful for testing games at scale, because of the wide variety of gameplay scenes the player may encounter. Why is the test on Crafter so interesting for game testing? Because Crafter evaluates a large number of general capabilities related to the RL agent, like strong ability to generalise (new generated maps for each episode), to deal with partial observability (each input image reveals only a small part of the world) and to long-term reasoning and survival. These abilities are very useful for testing games at scale, providing developers with insights to optimise gameplay and player experience. The future of game testing is here, and it is powered by Artificial Intelligence! 🔥 --- ❌ Are you interested in joining the project community? Get in touch 👉 https://github.com/Eclectic-Sheep/sheeprl ❌ SheepRL 🐑 is open-source, fully written in PyTorch and accelerated with LightningFabric - by Lightning AI. Feel free to use it for your AI projects, and if you want to contribute, we are more than happy to accept your pull requests! ❤️ submitted by /u/Manu_Orobix [link] [comments]
    Can I use Continuous algorithms (e.g. TD3) for Discrete Action spaces?
    My environment has hybrid action spaces and I was wondering if I can use continuous algorithms for discrete action spaces. I'm asking this because, well, agent can't learn and I'm trying to find the source of error. I was wondering if this was the source of problem. ​ My Assumptions On Solving This Problem: - Discrete is subspace of continuous, thus continuous algorithms will be able to handle discrete action spaces as well. - A non-hybrid action-space algorithm will be simpler than hybrid-action-space algorithms. ​ Method (I'm only describing the discrete action here): - Use TD3 as the training algorithm. No modification from the original training code. TD3 algorithm has been verified on Pendulum and other environments created for unit test purposes. - Policy network outputs the a…

  • Open

    Video Game Voice Actors Are Ready to Strike over AI. Here’s Why
    Video game voice actors are prepared to go on strike over the use of AI in game development. The current contract negotiations between the Screen Actors Guild-American Federation of Television and Radio Artists (SAG-AFTRA) and video game companies have stalled, with the major issues being pay raises and the use of AI to alter or generate actors' performances. SAG-AFTRA wants protections for its members to ensure their work is not stolen or replaced by AI. If negotiations don't progress, voice actors, stunt artists, and motion capture performers could potentially go on strike, leading to delays in game releases and recasting of beloved performers. The voice actors' strike in 2016 resulted in improvements to pay, and now they are prepared to strike again to fight for their rights. Video game performances are often seen as assets to be extracted and inserted into games, rather than recognizing the humanity and quality of life of the performers. The use of AI in game development raises concerns about how companies will use advances in generative AI to steal work or put performers out of a job. SAG-AFTRA wants transparency, consent, and compensation when it comes to the use of AI in games. Members of SAG-AFTRA have voted in favor of authorizing a strike, meaning voice actors, stunt artists, and motion capture performers could potentially join the picket line if negotiations don't progress. The strike could lead to delays in upcoming game releases and the recasting of performers if companies refuse to meet the union's demands. The fight for voice actors' rights is an existential one, as they want to retain the rights to their own voices and images and achieve wages that keep up with inflation Source : https://kotaku.com/sag-aftra-strike-voice-actor-spider-man-ai-union-1850874117 submitted by /u/NuseAI [link] [comments]  ( 9 min )
    [Question] Any 3X AI?
    Wanted to see if there are any 3X AI generated images available? I’m looking to see how I could use AI to generate images for my website. submitted by /u/IamMoe8868 [link] [comments]  ( 8 min )
    TikTok ran a deepfake ad of an AI MrBeast hawking iPhones for $2
    TikTok ran an ad featuring a deepfake of MrBeast offering iPhone 15 Pros for $2. AI-generated deepfake content is becoming more pervasive on social media platforms. Platforms like TikTok are facing challenges in moderating and handling the rise of AI deepfakes. MrBeast raised concerns about the ability of social media platforms to handle AI deepfakes. TikTok removed the ad and associated account for policy violations. Unauthorized AI-generated content featuring celebrities is a growing problem in platform advertising. The issue is expected to worsen as AI technology improves and becomes more accessible. Transparency and disclosure are crucial in AI-generated ad content featuring celebrities. TikTok is aware of the pervasiveness of AI-generated content on its platform and is taking steps to address it. Source : https://www.businessinsider.com/tiktok-ran-deepfake-ad-mrbeast-as-ai-generated-content-spreads-2023-10 submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Infinitia will apparently let you create your own AI enabled social simulations
    Came across this upcoming game which supposedly let's you create your own worlds and characters to live in the world...they also released a research paper explaining how they're doing it, using LLMs in all sorts of ways, primarily for reasoning and language. I think it could be a pretty fun take on passive games, just populating a world with your characters, checking up on them occasionally, putting them in weird situations lol. infinitia.ai for those who wanna check it out The NPCs do seems to be acting in an interesting way, as i saw in this video they posted on twitter... https://twitter.com/infinitia_app/status/1707102187518628245 ​ Watchall think? Another smallville clone? or something interesting.... submitted by /u/SeaJeweler3723 [link] [comments]  ( 9 min )
    Efficient AI design of robots.
    submitted by /u/DrJosh [link] [comments]  ( 8 min )
    From Stone to Silicon: The Odyssey of Humanity and Technology
    submitted by /u/Einsof__ [link] [comments]  ( 8 min )
    Don't Worry, AI Cannot Takeover the World, It Will Run Out of Battery
    The article discusses the importance of batteries in AI technology and how they limit the capabilities of AI robots. It explores the challenges of current battery technology and the need for better solutions. The article emphasizes the significance of developing ideal batteries that can provide long-lasting power without degradation. Source : https://notes.arkinfo.xyz/p/dont-worry-ai-cannot-takeover-the submitted by /u/NuseAI [link] [comments]  ( 9 min )
    GPT-4 outperforms its rivals in new AI benchmark suite GPT-Fathom
    ByteDance and the University of Illinois researchers have developed an improved benchmark suite with consistent parameters, called GPT-Fathom, that indicates GPT-4, the engine behind the paid version of ChatGPT, significantly outperforms leading LLMs, including its biggest competitor, Claude 2. For the latest advancements in AI, look here first. ​ https://preview.redd.it/v4fo8zser0sb1.png?width=1292&format=png&auto=webp&s=7e29fe9ac1af3efcb936ee61e9202717eed7e702 GPT-Fathom's breakthrough The new benchmark suite, GPT-Fathom, addresses consistent settings issues and prompt sensitivity, attempting to reduce inconsistencies in LLM evaluation. In a comparison using GPT-Fathom, GPT-4 outperformed over ten leading LLMs, crushing the competition in most benchmarks, and showing significant performance leaps from GPT-3 to its successors. Performance specifics The gap in performance was especially pronounced against Claude 2, ChatGPT's biggest rival. GPT-4's Advanced Data Analysis model exhibited superior performance in coding, giving it an edge as compared to LuckLlama 2, the current best-performing open-source model. Llama 2-70B showed comparable or better performance than gpt-3.5-turbo-0613 in safety and comprehension but displayed worse performance in "Mathematics", "Coding", and "Multilingualism". The seesaw effect The research team noted a 'seesaw effect' where an improvement in one area can lead to degradation in another. For instance, GPT-4 saw a performance drop on the Mathematical Geometry Simple Math (MGSM) benchmark, despite improving its performance significantly on the text comprehension benchmark DROP. (source) P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and developments in AI. Professionals from Meta, Google, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    Runway has launched Gen 2 Director mode. The speed at which this company works is Insane
    submitted by /u/First_Development101 [link] [comments]  ( 8 min )
    How Ai Could Upend Foreign Policy - An Interview with Ian Bremner and Mustafa Suleyman
    submitted by /u/finger_puppet_self [link] [comments]  ( 8 min )
    CGPT-4 on some ways "prompt/response" posts advance AI
    Prompt: Explain why posting prompts and your responses speeds up AI development in the following ways: It allows us to analyze responses to prompts that we would probably never have thought of. It allows us to share ideas that the open source community would advance. It allows us to better democratize the process of alignment, bringing in as many people as possible. CGPT-4: Posting prompts and AI responses serves multiple functions that are beneficial to the progress of AI development. First off, it exposes AI to a myriad of queries that developers might not have otherwise considered. This wide range of prompts allows the machine to improve its natural language processing abilities, making it more versatile and effective in interpreting and responding to human language. The more dive…  ( 9 min )
    A.I Makes a Video game on the App Store
    submitted by /u/usmansid98 [link] [comments]  ( 8 min )
    Infinite context windows? Streaming LLMs can be extended to infinite sequence lengths without any fine-tuning.
    LLMs like GPT-3 struggle in streaming uses like chatbots because their performance tanks on long texts exceeding their training length. I checked out a new paper investigating why windowed attention fails for this. By visualizing the attention maps, the researchers noticed LLMs heavily attend initial tokens as "attention sinks" even if meaningless. This anchors the distribution. They realized evicting these sink tokens causes the attention scores to get warped, destabilizing predictions. Their proposed "StreamingLLM" method simply caches a few initial sink tokens plus recent ones. This tweaks LLMs to handle crazy long texts. Models tuned with StreamingLLM smoothly processed sequences with millions of tokens, and were up to 22x faster than other approaches. Even cooler - adding a special "[Sink Token]" during pre-training further improved streaming ability. The model just used that single token as the anchor. I think the abstract says it best: We introduce StreamingLLM, an efficient framework that enables LLMs trained with a finite length attention window to generalize to infinite sequence length without any fine-tuning. We show that StreamingLLM can enable Llama-2, MPT, Falcon, and Pythia to perform stable and efficient language modeling with up to 4 million tokens and more. TLDR: LLMs break on long convos. Researchers found they cling to initial tokens as attention sinks. Caching those tokens lets LLMs chat infinitely. Full summary here Paper link: https://arxiv.org/pdf/2309.17453.pdf submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    Where do I produce free intro and outro AI music for my Podcast for free.
    I am starting a podcast on Psychology and Philosophy submitted by /u/21bce [link] [comments]  ( 8 min )
    BackerKit Will Restrict the Use of AI Art
    Crowdfunding site BackerKit has announced a new policy that restricts the use of solely AI-generated content on its platform. The policy aims to address concerns regarding ownership of content, ethical sourcing of data, and compensation for the process of creating content. Projects that lack a minimum requirement of human input will not be allowed to crowdfund on the BackerKit site. There is some flexibility with AI generative fill and the use of AI transcription services, but a high level of human input is required to satisfy the policy. BackerKit will automatically exclude all content uploaded by creators for their projects from AI training in support of this policy. The new restrictions will go into effect on October 4, giving creators time to alter their projects if they are using AI-generated images and text. Source : https://gizmodo.com/backerkit-ai-art-new-policy-crowdfunding-generative-1850891882 submitted by /u/NuseAI [link] [comments]  ( 9 min )
    One-Minute Daily AI News 10/2/2023
    iPhone designer Jony Ive is reportedly talking to OpenAI CEO Sam Altman about making an AI hardware device.[1] Visa announced today that it plans to invest $100 million in companies developing generative AI technologies and applications “that will impact the future of commerce and payments.”[2] More than 40% of labor force to be affected by AI in 3 years, Morgan Stanley forecasts. [3] Tom Hanks: Don't fall for "AI version of me" promoting dental plan.[4] Sources: [1] https://www.businessinsider.com/chatgpt-head-iphone-designer-jony-ive-ai-device-openai-report-2023-9?amp [2] https://techcrunch.com/2023/10/02/visa-earmarks-100m-to-invest-in-generative-ai-companies/ [3] https://www.cnbc.com/2023/10/02/more-than-40percent-of-labor-force-to-be-impacted-by-ai-in-three-years-morgan-stanley-forecasts.html [4] https://www.cbsnews.com/amp/news/tom-hanks-ai-version-of-me-promoting-dental-plan/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
  • Open

    [D] What are some effective dimensionality reduction (unsupervised feature selection) techniques for a high dimensional, sparse dataset?
    I am considering comparing mutual information scores, but I also don't think I understand MI well enough. For example, I(X;Y) = H(X) + H(Y) - H(X,Y). To me, visualizing H(X) and H(Y) as venn diagrams and H(X,Y) as the information from both X, Y (like an overlapping venn diagram) makes me think that when X, Y are disjoint, then MI is 0 and when X, Y overlap completely, then the MI score will be high. So, I'm thinking that a high MI value is "bad" since this means X, Y would be redundant. I am not sure if my understanding here is correct. Another method I have tried is to binarize the data for each feature (represented as rows in my dataset) using "present" (1) and "absent" (0). The main issue I have run into doing this is that I am trying to then create a distribution to compare the fea…  ( 10 min )
    [D] Best interface to use LLMs for code: Chat or completion?
    Hi everyone, I am quite interested in understanding what are the feedback from the community in terms of interface to leverage LLMs for code productivity. Because LLMs tend to do mistake I have mostly used Chat-like interfaces, like ChatGPT, as they allow to interact with the model and converge to a conclusion. I haven't used Copilot for a while but my feeling was that it could do some boilerplate correctly but then it quickly started suggesting code that would be misleading and could actually hurt productivity. It might have changed since then but that was my feeling back then. What is your favorite option and why? View Poll submitted by /u/Separate-Still3770 [link] [comments]  ( 9 min )
    [D] ML input data has to be derived from a larger dataset
    Hello everyone. I am curious to know if anyone has encountered a ML problem like this and if so, I seek your advice. Usually in ML classification such as the IRIS dataset, each row represents a sample and each column a parameter, right ! My problem is that my ML classification parameters have to be derived from a range of values (parent data). I have taken mean of the parent values to generate the parameters for the ML input data. This results in lower classification accuracies using Random forest and XGBoost. Has anyone encountered a similar situation like this where the data has to be generated from a range of other datasets? Is there any other way to do this? I did not find any papers or articles from the web so just asking. I can generate additional parameters from other statistics such as median, standard deviation etc. which can improve the classification accuracy but can make interpretation of the results a little weird, domain wise. I wish to avoid this if possible. submitted by /u/notmyfault7676 [link] [comments]  ( 9 min )
    [D] Book review for Meta's ML Design interview? Machine Learning System Design Interview (by Ali Aminian and Alex Xu)
    I'm preparing for the ML system design interview for Meta, and I searched for various resources. This book (ML System Design Interview (by Ali Aminian & Alex Xu)) seems like a solid structured resource that covers solutions to case studies in detail. Has anyone used it to prepare for Meta's ML System Design interview? Thoughts? Khang's book doesn't seem to have great reviews. Chip Huyen's book (Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications) doesn't seem very focused on interview prep?? Also, happy to hear about other cool resources to prepare. Thanks very much! submitted by /u/irEFrienfk [link] [comments]  ( 9 min )
    [R] Open X-Embodiment: Robotic Learning Datasets and RT-X Models - DeepMind 2023 - RT-X exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms!
    Blog: https://www.deepmind.com/blog/scaling-up-learning-across-many-different-robot-types https://robotics-transformer-x.github.io/ here you can also find the Datasets and Code! Paper: https://robotics-transformer-x.github.io/paper.pdf Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train “generalist” X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. https://preview.redd.it/oxzutrhtb1sb1.jpg?width=1693&format=pjpg&auto=webp&s=37b8b1dbf5f489dc2c8eaca4d15cb9c32ebc2660 https://preview.redd.it/ldsiwshtb1sb1.jpg?width=1494&format=pjpg&auto=webp&s=fdbf0f91c705acf11bff854f6d6af82dddd47021 https://preview.redd.it/ikk18jitb1sb1.jpg?width=1693&format=pjpg&auto=webp&s=e50b443dc4b0266a0480d54c4f92a0b708485797 https://preview.redd.it/t5wmciitb1sb1.jpg?width=1361&format=pjpg&auto=webp&s=2971fd645acb6dcbed2ca3522e311d0772c45964 ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [D] Biggest problems with ML in industry?
    For all my corporate ML engineers I have a question, what are the most annoying / biggest problems you face when developing/deploying ML in industry? This can be anywhere from data, to tuning, to even MLOPS. submitted by /u/hai_cben [link] [comments]  ( 9 min )
    [D] Difficulty with paper implementations on google colab
    I am not from CS background, my knowledge is from online courses and books. All of which used some variation of Jupyter notebook. My knowledge of code can be lacking sometimes, since I am not from CS background. I am trying to implement some computer vision paper codes on newer samples. I understand the papers, and the underlying mechanisms. However, I fail to decipher the codes provided with the associated github repository. Usually, these repository contains information on how to recreate the experiment on some specific data using shell. But I am using google Colab for this purpose, as I don't have access to GPU, and I found it impossible to recreate the experiments in the google Colab, using shell commands, let alone extend it to newer samples. I would appreciate some help in this regard, I haven't done this before, and there aren't really any tutorial/resource on how to do this. Ideally, what I am trying to do is separate the model, input some images, get the output, and interpret it. I am stuck, and I would really appreciate some help or advice in this regard. Right now I am trying to work with this paper, meta ood I would appreciate any help/advice/resource anything. I feel very lost. Thanks in Advance. submitted by /u/franticpizzaeater [link] [comments]  ( 9 min )
    Repurposing a personal desktop computer [P]
    Hello! I'm debating turning my old desktop (old CPU but relatively new GPU 3980 or 90) into a ML box that I can remote into. I'm sure people here have done something similar and I was wondering if anyone could point me towards some resources for getting it off the ground/any pitfalls to avoid/suggestions. I'm an active data scientist researcher for my job and this would just be for fun side projects but I have some pretty glaring holes in my knowledge of computers (like the best way to set this up - should I uninstall windows install unbuntu or is windows fine?) Honestly I'm sure my ignorance will be pretty apparent from the questions I'm asking/not asking so any advice anyone has would be welcome! Thanks! Sorry if this is the wrong subreddit for this sort of thing. ​ submitted by /u/shebaiscool [link] [comments]  ( 9 min )
    [R] Generative memory: generative diffusion models are equivalent to modern Hopfield nets
    https://arxiv.org/abs/2309.17290 submitted by /u/LucaAmbrogioni [link] [comments]  ( 8 min )
    [D] Stuck in Automation of AI models
    Hello everyone! ​ I'm currently working on a project and have hit a roadblock in automating the deployment of my machine-learning models. Can anyone provide guidance on the best practices or tools for streamlining the deployment process? Specifically, I'm looking to create a seamless workflow where models can be easily uploaded, deployed on the cloud, and accessible through APIs. Any insights or advice would be greatly appreciated! ​ Automation!!! submitted by /u/homelander81 [link] [comments]  ( 9 min )
    [P] The Case of the Missing Masterpiece
    Hi, I just wanted to share an applied image classification problem that I worked on a few years ago: https://vdalv.github.io/2018/09/01/missingMasterpiece.html submitted by /u/vdalv [link] [comments]  ( 9 min )
    Need to build a XAI model to explain the behaviour of an IDS [P]
    Hello, I need help from someone that knows about XAI. I have to create a XAI model to intérprete the resulta of an AI model, an MLP, that works as an IDS classifier. I have no idea on how to do It and I have been completely blocked for 2.5 years. This is the final project of my career and I just don't know how to do It, and my tutor isn't very helpful. If anyone is able to help I would explain him what I have to do and would be very grateful. Thanks for your help submitted by /u/elMandarine [link] [comments]  ( 9 min )
    [D] Optimal scheduling tool with AI/ML recommendations
    Hello all, I'm trying to plan out for a new web platform development for workforce management but have little experience. We all know that hard coding can be done for general scheduling, including manager polling shifts based on labor category, staff assignments, conflt resolving, emergency scheduling, etc. But what I want to research to is....how can I ensure that one optimal schedule is automatically computed using AI/machine learning tools so that I don't have to go through the list of hard-coded generated schedules (I’m sure these will work fine, but still want to compute one ultimate schedule). submitted by /u/Playful-Bed-2183 [link] [comments]  ( 9 min )
    [R] Break-A-Scene: Extracting Multiple Concepts from a Single Image
    ​ Break-A-Scene: Given a single image with multiple concepts, annotated by loose segmentation masks, our method can learn a distinct token for each concept, and use natural language guidance to re-synthesize the individual concepts or combinations of them in various contexts. Project Page: https://omriavrahami.com/break-a-scene/ Code is publicly released! Abstract Text-to-image model personalization aims to introduce a user-provided concept to the model, allowing its synthesis in diverse contexts. However, current methods primarily focus on the case of learning a single concept from multiple images with variations in backgrounds and poses, and struggle when adapted to a different scenario. In this work, we introduce the task of textual scene decomposition: given a single image of a scene that may contain several concepts, we aim to extract a distinct text token for each concept, enabling fine-grained control over the generated scenes. To this end, we propose augmenting the input image with masks that indicate the presence of target concepts. These masks can be provided by the user or generated automatically by a pre-trained segmentation model. We then present a novel two-phase customization process that optimizes a set of dedicated textual embeddings (handles), as well as the model weights, striking a delicate balance between accurately capturing the concepts and avoiding overfitting. We employ a masked diffusion loss to enable handles to generate their assigned concepts, complemented by a novel loss on cross-attention maps to prevent entanglement. We also introduce union-sampling, a training strategy aimed to improve the ability of combining multiple concepts in generated images. We use several automatic metrics to quantitatively compare our method against several baselines, and further affirm the results using a user study. Finally, we showcase several applications of our method. ​ submitted by /u/sgd_is_all_you_need [link] [comments]  ( 9 min )
    [R] MIT, Meta, CMU Researchers: LLMs trained with a finite attention window can be extended to infinite sequence lengths without any fine-tuning
    LLMs like GPT-3 struggle in streaming uses like chatbots because their performance tanks on long texts exceeding their training length. I checked out a new paper investigating why windowed attention fails for this. By visualizing the attention maps, the researchers noticed LLMs heavily attend initial tokens as "attention sinks" even if meaningless. This anchors the distribution. They realized evicting these sink tokens causes the attention scores to get warped, destabilizing predictions. Their proposed "StreamingLLM" method simply caches a few initial sink tokens plus recent ones. This tweaks LLMs to handle crazy long texts. Models tuned with StreamingLLM smoothly processed sequences with millions of tokens, and were up to 22x faster than other approaches. Even cooler - adding a special "[Sink Token]" during pre-training further improved streaming ability. The model just used that single token as the anchor. I think the abstract says it best: We introduce StreamingLLM, an efficient framework that enables LLMs trained with a finite length attention window to generalize to infinite sequence length without any fine-tuning. We show that StreamingLLM can enable Llama-2, MPT, Falcon, and Pythia to perform stable and efficient language modeling with up to 4 million tokens and more. TLDR: LLMs break on long convos. Researchers found they cling to initial tokens as attention sinks. Caching those tokens lets LLMs chat infinitely. Full summary here Paper link: https://arxiv.org/pdf/2309.17453.pdf submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Really good dataset for a Course Capstone
    Hey everyone! My friends and I are taking a Data Science course in our university. We are modestly versed in ML/DL techniques, and want to use everything we know on a really good capstone project for this course. We are looking for a dataset where we can demonstrate a nice variety of techniques to really blow the socks off our Professor. Ideally we'd like this to be stemming from something basic that most would consider "Data Science", as in something with a tabular dataset and elements of classification. Though we still want chances to bring in what we know from outside the course: for example, if there's images to supplement the dataset we could use Image Classification models or something multimodal to bring in more features, if there's natural language data then we could use LLMs to extract salient features etc. More importantly though, we want something whose exploration can be really motivated so it doesn't seem we're only in it for the ML aspect. Thank you! submitted by /u/Subject-Revolution-3 [link] [comments]  ( 9 min )
    [D] Competitiveness in ML research
    I've been diving deep into the world of machine learning research, and I'm genuinely baffled: how on Earth do some researchers seem to pump out paper after paper? I mean, there's only 24 hours in a day, right? Are academic minions (i.e. PhD students) doing all the heavy lifting? Or maybe some highly efficient workflows I'm not privy to? On a more serious note, I would like a career in ML, and the sheer volume and pace of these publications is making me feel a bit disheartened. How is this prolificity possible? Any words of encouragement or advice? submitted by /u/blabboy [link] [comments]  ( 9 min )
    [D] Why should I use a hosted/cloud VectorDB solutions over a serverless or vector store?
    Why the hell should i use cloud based or server hosted solution over a easy peasy servless variant like lancedb or even faiss vector store is enough for most of the use cases on small-medium I often see posts like "oh my stack is... pinecone Chroma weaviate_io" And they just ingest minisets of data, what the hell man submitted by /u/Dear_Bullfrog193 [link] [comments]  ( 9 min )
    [P] FontoGen: generating true-type fonts
    I'd like to share a project that I've spent a few weekends working on. FontoGen is an autoregressive encoder-only transformer model that's capable of generating true-type fonts. GitHub: https://github.com/SerCeMan/fontogen Weights: https://huggingface.co/SerCe/fontogen Blog post with more details: https://serce.me/posts/02-10-2023-hey-computer-make-me-a-font The project is largely an exploration of whether generating fonts natively, line by line, is possible. I'm not aware of any previous research that would achieve the same results for complete fonts previously. This is my first ML-specific project, and I would appreciate any feedback on the model architecture, and I'm also happy to answer any questions you may have. submitted by /u/SerCeMan [link] [comments]  ( 9 min )
    [D] What happens after removing the causal mask of LLaMA?
    The causal mask in LLaMA serves as a protective barrier to prevent information leakage. However, in certain tasks, leveraging information leakage can be a beneficial strategy for enhancing performance, particularly in tasks like token classification, such as Named Entity Recognition (NER). Interestingly, the paper titled "Label Supervised LLaMA Finetuning" (available at https://arxiv.org/abs/2310.01208) reveals a significant performance boost in token classification when the causal mask is removed. submitted by /u/seanlee97 [link] [comments]  ( 9 min )
    [R] RA-DIT: Retrieval-Augmented Dual Instruction Tuning
    New paper that proposes instruction-tuning with in-context retrieval-augmentation to improve SOTA LLMs in cases where access to large, external knowledge sources is needed. Tested on LLaMA 65B, 13B and 7B. https://arxiv.org/abs/2310.01352 submitted by /u/todpole3 [link] [comments]  ( 9 min )
    [D] How do you scale computational intensive Python scripts?
    Hey ML Community, I'm wondering how people currently go about scaling their Python programs? Lets say for instances you're doing batch inference using an LLM. Each prediction takes 2-3 minutes to process, how would you go about scaling that to make a million predictions? I'm asking this question because a few months back I started building a tool to quickly parallelize python functions across thousands of machines in the cloud. I'm focused on making the barrier to interact with the cloud extremely low and want to know all the core alternatives out there. Also, if you have any advice on starting a business I'd love to hear it. submitted by /u/Ok_Post_149 [link] [comments]  ( 9 min )
    [D] What is the highest quality automatic image captioning solution?
    I make very high quality Lora's and finetuned stable diffusion models. These models yield very good results, but more importantly they are very easy to use as I have always captioned my images as one would use natural spoken language (no weird booru tags and all that jazz). The most labor intensive processes in the workflow is image captioning. For example, my last project had almost 10000 images in the data set. Every single image was manually captioned by me as the quality of all automated solutions I tried is subpar and has too many accuracy issues. I have tried Blip auto captioning and LLava, but they still were not accurate enough for what I needed. I am hoping someone here can suggest a solution, if one exists, thanks. submitted by /u/no_witty_username [link] [comments]  ( 9 min )
    [D] (Interview Help) Do you know any good resources for interview case studies in the finance domain (especially dealing in loan and credit cards)
    I'm preparing for a data science interview and am looking for case study prep resources, especially for the financial domain (loans and credit cards). Mainly, I want to understand some good metrics for the financial domain, ways to break down the questions and create a rough data model, kinds of conditions to take into consideration (eg. Seasonality), kinds of effects that can be used expected (like opportunities and risks), etc. Any resources or help is greatly appreciated! submitted by /u/how_the_turn_tablez [link] [comments]  ( 9 min )
  • Open

    Help Restricting Actions
    Hello, I am new to RL, I am currently working on a school project that requires it. I am working on making a model to play a game very similar to wordle, so for the function of this post it may as well be wordle. Right now I am trying to get it to work with this gym https://github.com/zach-lawless/gym-wordle, and I will make my tweaks later. This gym has a multi discrete action space, which makes sense to me for a word, IDK if thats best. To validate words, it has its own exception type. I am trying to train this with stable_baselines3, but the exception keeps being raised, since it is trying to guess garbled words like "xcjhr". Is there a way I can validate actions before they are made so the model is restricted to only guessing valid words? Is there a better way to do this? It doesnt need to be the best, it really only has to sorta work. Any help is appreciated, thanks! submitted by /u/ClackHack [link] [comments]  ( 9 min )
    Looking For Advice on Training and Reward Functions
    Hi Everyone, I'm venturing into a new territory of Reinforcement Learning (RL) through a personal project, despite having a solid background in various other ML domains. I'm developing an RL agent to play Skyjo, a turn-based card game, and I'm encountering some challenges related to reward optimization and game-ending decisions by the agents. I'd appreciate any advice or insights you might have! Project Overview: Objective: Develop an RL model to play Skyjo competitively. Environment: Built using Gymnasium and Pytorch. Agents: Two agents working in tandem - one for card selection (discard/draw) and the other for action and location selection. Training: 4-8 agent instances play against each other. Repository: https://github.com/grantslewis/auto_skyjo Reward Structure: Small p…  ( 10 min )
    My frustration level with Torch/Keras/Tensorflow and DQNs is killing me
    RANT: I've tried every possible example I can get my hands on. I've looked at reference examples. I've looked at Medium articles. I've looked at stuff written by college freshmen. Every example I find for a DQN written either for torch or tensorflow (and either tf_agents or keras), seems to either have a nasty bug preventing it to work or such a severe memory leak that it is unusable. I tried Torch recently and was doing some simple gridworlds. It does fine for tiny gridworlds like 5x5. I decided to push it a little (not much at all) to a known 21x21 gridworld from recognized papers - reference example died and ran out of memory after 3000 episodes - I mean - really? 3000 episodes? I ran on CPU and gave it 64GB. I don't know how much memory this SHOULD take. I can do it in a Q-Table for…  ( 10 min )
    Advice to improve outcome on a turn-based strategy game
    Hello everyone, I'm a total beginner in the reinforcement learning (RL) community, and I would appreciate some advice on a problem I'm currently facing. I've created a simple 2D turn-based game with only movement at the moment (I will also add combat features when I have success with training an AI for the movements). Game The rules are simple : A grid of 14x40 (560 cells in total) 1 Agent with a limited number of Move Point (MP) 1 Target that does not move (atm) The agent can end its turn to get its MP back I already implemented a pathfinding algorithm using A* which works really well but I would like to train an AI to reach the target as fast as possible (turn-wise). Here is a simulation of a state : ​ https://preview.redd.it/0p5yijnb60sb1.png?width=442&format=png&auto=…  ( 10 min )
    Cleanba, our new distributed DRL platform is finally out 🤗
    submitted by /u/vwxyzjn [link] [comments]  ( 8 min )
  • Open

    DSC Weekly 3 October 2023
    Announcements Top Stories In-Depth The post DSC Weekly 3 October 2023 appeared first on Data Science Central.  ( 20 min )
    Generative AI Megatrends: ChatGPT can see, hear and speak – but what does it mean when ChatGPT can think?
    One of the most impressive generative AI applications I have seen is viperGPT. The image / site explains it best. The steps are: This example, earlier this year, showed the potential of multimodal LLMs And as of last week, that future is upon us ChatGPT can now see, hear & speak. What are the implications… Read More »Generative AI Megatrends: ChatGPT can see, hear and speak – but what does it mean when ChatGPT can think? The post Generative AI Megatrends: ChatGPT can see, hear and speak – but what does it mean when ChatGPT can think? appeared first on Data Science Central.  ( 20 min )
    Cracking the code: The rising demand for data scientists in various industries
    In the ever-evolving landscape of the digital era, the relentless quest for deriving actionable insights from a sea of information has become the cornerstone of innovation and strategy. As businesses and organizations strive to navigate the complex corridors of big data, the spotlight invariably falls upon the expertise of data scientists, the modern-day architects of… Read More »Cracking the code: The rising demand for data scientists in various industries The post Cracking the code: The rising demand for data scientists in various industries appeared first on Data Science Central.  ( 21 min )
    Generative AI megatrends: How many LLMs would you subscribe to?
    I recently subscribed to openAI GPT4 for the OpenAI Code Interpreter/Advanced data analytics. We are using it in our class at the University of Oxford.  Its really cool and we are also waiting the multimodal openAI features Recently, a well known AI critic said that he does not see how Generative AI companies could be… Read More »Generative AI megatrends: How many LLMs would you subscribe to? The post Generative AI megatrends: How many LLMs would you subscribe to? appeared first on Data Science Central.  ( 19 min )
    A few highlights of the Efficient Generative AI Summit (EGAIS)
    Large language models (LLMs) for generating text and vision models for generating images are notoriously inefficient. The larger they get, the more power hungry they become.   Kisaco Research in September hosted a one-day event in Santa Clara dedicated to the topic of generative artificial intelligence (GAI) efficiency, followed by a three-day Summit on Hardware and… Read More »A few highlights of the Efficient Generative AI Summit (EGAIS) The post A few highlights of the Efficient Generative AI Summit (EGAIS) appeared first on Data Science Central.  ( 21 min )
  • Open

    AI copilot enhances human precision for safer aviation
    Designed to ensure safer skies, “Air-Guardian” blends human intuition with machine precision, creating a more symbiotic relationship between pilot and aircraft.  ( 8 min )
  • Open

    Accelerate Foundation Models Research: Supporting a global academic research ecosystem for AI
    A diverse research ecosystem is essential to realizing the promise of AI. Accelerate Foundation Models Research aims to expand access to powerful models, engaging academics outside of computer science to pursue a broad range of important opportunities. The post Accelerate Foundation Models Research: Supporting a global academic research ecosystem for AI appeared first on Microsoft Research.  ( 10 min )
  • Open

    Meet the Maker: Robotics Student Rolls Out Autonomous Wheelchair With NVIDIA Jetson
    With the help of AI, robots, tractors and baby strollers — even skate parks — are becoming autonomous. One developer, Kabilan KB, is bringing autonomous-navigation capabilities to wheelchairs, which could help improve mobility for people with disabilities. The undergraduate from the Karunya Institute of Technology and Sciences in Coimbatore, India, is powering his autonomous wheelchair Read article >  ( 6 min )
    CG Geek Makes VFX Look Easy This Week ‘In the NVIDIA Studio’
    Releasing a 3D tutorial dubbed The Easiest VFX Tutorial Ever takes supreme confidence and the skills to back it up. Steve Lund a.k.a. CG Geek — the featured artist of this week’s In the NVIDIA Studio installment — has both in spades.  ( 8 min )
  • Open

    From graph theory to category theory
    Let G be a directed graph whose nodes are the positive integers and whose edges represent relations between two integers. In our first example we’ll draw an edge from x to y if x is a multiple of y. In our second example we’ll draw an edge from x to y if x ≥ y. […] From graph theory to category theory first appeared on John D. Cook.  ( 6 min )
    Test functions
    Test functions are how you can make sense of functions that aren’t really functions. The canonical example is the Dirac delta “function” that is infinite at the origin, zero everywhere else, and integrates to 1. That description is contradictory: a function that is 0 almost everywhere integrates to 0, even if you work in extended […] Test functions first appeared on John D. Cook.  ( 6 min )
    Groups vs Abelian groups: Pedantic or profound?
    This article will probably only be of interest to a small number of readers. Those unfamiliar with category theory may find it bewildering, and those well versed in category theory may find it trivial. My hope is that someone in between, someone just starting to get a handle on category theory, will find it helpful. […] Groups vs Abelian groups: Pedantic or profound? first appeared on John D. Cook.  ( 7 min )
  • Open

    DALL·E 3 system card
    No content preview  ( 1 min )

  • Open

    [Discussion] I didn't do well in Calculus III
    So I got an A in calculus three but I probably didn't deserve it since it was online and all I did was look up the answer and understand the problems given on the test. So I probably have a C level understanding. Will I be tested on calc 3 knowledge in machine learning or should I retake calc 3? submitted by /u/Glittering-Target-87 [link] [comments]  ( 9 min )
    [P] Hand keypoint detection
    Hello Reddit, I have a question regarding the right tool. I'm looking for a tool / model to detect hand-keypoints in a video stream of a person assembling stuff. I know OpenPose is a possible one, also Google MediaPipe. I’m not really getting along with OpenPose and MediaPipe don’t show really good results. In my project, I would like to detect hand keypoints in assembly scenarios. It would be ok to use 2 cameras or a depth camera if necessary. Does anybody knows any models / tools to use? Thanks in advance :) submitted by /u/VGHMD [link] [comments]  ( 9 min )
    [P] Best option for a large, local embedding database?
    Langchain offers a wide array of vector databases for text embedding models. I need to create a vector database for around 3 million sentence embeddings, each being of dimension 384. I'm building a prototype, so it has to be local and free of charge to use. So far, I've hit limits for Chroma (41,666 max). I've also tried Redis, QDrant and FAISS - each of these gets so large that it eats up all the RAM and the process gets killed, or with QDrant, just errors out. I've used Pinecone before, but I don't really want to pay for a prototype as I have plenty of disk space. I was thinking of chunking the 3 million documents into local vector stores of size 41,666 using ChromaDB - but there isn't a whole lot out there about whether Chroma would allow me to merge all ~70 of these smaller databases into a bigger one for search. I also cannot find whether it would be possible to load all 70 of these into memory and search each one individually. So what are my options? My other thought was just creating a large Doc2Vec model, however I would like to use something more sophisticated like Huggingface embedding models. submitted by /u/russ_fegoli [link] [comments]  ( 9 min )
    [D] Proof of convergence for a heavy-ball adaptive step-size algorithm for non-convex functions
    Hello everyone, I am struggling with prooving convergence for an optimizer which uses adaptive step-size with heavy ball algorithm for convex and non-convex functions. In some literature, I could find a regret bound analysis/proof for convex functions and proving that the estimated gradient at t -> inf goes to zero for non-convex functions. There are some assumptions and preconditions: The algorithm is heavy ball momentum with adaptive step-size. ' X_(k+1) = X_k - \eta_k . \nabla(f(x_k)) + \beta(x_k - x_(k-1)) The following assumptions are made: A. The function is smooth. B. The function is Lipschitz. C. The gradients are Lipschitz. I attempt to prove the convergence to a critical point or a local minima. Where the estimate of the gradients at any instance k goes to zero. i.e. E[\nabla(f(x_k))] = 0 as t -> inf. Could anyone please guide me through the process of convergence proof for non-convex functions or give me literature recommendations for the same. Thank you very much in advance. submitted by /u/Loose_Foundation5990 [link] [comments]  ( 9 min )
    [D] open problems after GPT4 capabilities
    We all know that LLMs (and especially foundation models) are extremely functionally capable. Has anyone made a nice list of deficiencies that they show? I know Gary Marcus did so many years ago, but after GPT3 and GPT4 -- what is still unsolved? submitted by /u/Cultural-Average3959 [link] [comments]  ( 9 min )
    [D] Hoeffdings inequality, does it make sense practically?
    According to it, increasing the hypotheses set loosens the upper bound between in-sample and out-of-sample error. ​ Can't we subdivide the hypotheses set to multiple ones, ensuring tighter bounds in general? ​ and generally, have you seen it in use before? I have seen a lot of ML projects without anybody mentioning it or anything theoretical. submitted by /u/2azo [link] [comments]  ( 9 min )
    [P] Good models to use for multimodal object detection when both the modalities are image based or some models which support ensembling?
    So basically I have a dataset with images of vehicles in top down view in both RGB and IR, what are some models I can use for both unimodal and multimodal object detection to compare their performance. Links to GitHub repos would be helpful. Thanks submitted by /u/Xyber5 [link] [comments]  ( 9 min )
    Benefits of converting DICOM images to PNG's [P]
    I try to understand what are the benefits to convert DICOM images to PNG's. Context: I have DICOM images which I already extracted the useful meta-data I want to use. Those images are for a task, classification-detection pipeline of some disease. So as I already asked, what are the benefits of converting those DICOM files to PNG's rather then just using pydicom and the dicom pixel_array? Reason I ask this is because I saw many top 5 users on kaggle do this when dealing with DICOM images. If I understand how networks actually works, they get as input an array of pixels as floating point numbers no? So what's the differences between DICOM pixel_array to PNG's pixel array and numpy array or tensor? both are eventually will be fed to the network as a tensor of floating numbers. Is the reason is because PNG's are usually faster to train? Is the reason is because PNG's have more libraries support for preprocessing / augmentation / etc. ? Is the reason is because PNG's are the format many pre-trained models expect to? (I write this knowing it's 99% not true, as mentioned the tensor thing) Thanks in Advance, and Please, forgive my English (I could use AI tools to fix it but I feel addicted already) submitted by /u/01jasper [link] [comments]  ( 9 min )
    [D] What kind of distribution is this?
    Hey guys, I am wondering what kind of distribution my data are following? I want to fit a distribution function to them and use this fitted distribution function to generate new samples with a given mean and standard deviation (python). Any tips for this? Happy to hear your suggestions :) https://preview.redd.it/kdcftvpq8urb1.png?width=408&format=png&auto=webp&s=6163b9f571069e098c9e9a609c3d1cb9910fe1fb submitted by /u/Tigmib [link] [comments]  ( 9 min )
    [R] Efficient Streaming Language Models with Attention Sinks - Meta AI 2023 - StreamingLLM enables Llama-2, Falcon and Pythia to have an infinite context length without any fine-tuning! Allows streaming use of LLMs!
    Paper: https://arxiv.org/abs/2309.17453 Github: https://github.com/mit-han-lab/streaming-llm Abstract: Deploying Large Language Models (LLMs) in streaming applications such as multi-round dialogue, where long interactions are expected, is urgently needed but poses two major challenges. Firstly, during the decoding stage, caching previous tokens' Key and Value states (KV) consumes extensive memory. Secondly, popular LLMs cannot generalize to longer texts than the training sequence length. Window attention, where only the most recent KVs are cached, is a natural approach -- but we show that it fails when the text length surpasses the cache size. We observe an interesting phenomenon, namely attention sink, that keeping the KV of initial tokens will largely recover the performance of wind…  ( 9 min )
    [Project] I just released an open-source package, TorchLens, that can extract the activations/metadata from any PyTorch model, and visualize its structure, in just one line of code. I hope it helps you out!
    You just give it any PyTorch model (as-is, no changes needed), and it spits out a data structure with the activations of any layer you want, along with a bunch of metadata about the model and each layer and an optional automatic visualization of the model's computational graph. I hope this greatly speeds up the process of extracting features from models for further analysis, and also serves as an aid in quickly understanding new models. I also hope it'd be helpful for teaching purposes, too. It is meant to work for any PyTorch model whatsoever and I've tested it on hundreds of models (see the "model menagerie" of visualizations below), though it's always possible I've missed some edge case or another. Hope it helps you out--I'm still actively developing it, so let me know if there's anything on your wishlist! https://preview.redd.it/k37nhejvxtrb1.png?width=640&format=png&auto=webp&s=5713a8711110644794e2264d84dd479ede861c5e GitHub Repo Twitter Thread Paper CoLab Tutorial Gallery of Model Visuals submitted by /u/therealjmt91 [link] [comments]  ( 9 min )
    [D] Why Vision Tranformers?
    Transformers have been the new kid on the block, easy to see why with LLMs and and sequential output generation, but I still don't know why vision transformers based on ViT are so hot in the field right now. From my understanding, CNNs are just vastly better than transformers for vision tasks, as its inductive biases allows it to determine the relationship between neighboring features of an image via pooling and filters. However, transformers don't have this kind of inductive bias, and as a result, take much more data and compute to reach similar levels of performance. I read this survey paper on Vision Transformers here: https://arxiv.org/pdf/2012.12556.pdf, which has the performance of CNNs vs various transformer models for CV. Comparing even the best vision transformers to the classic …  ( 10 min )
    [R] Tool-Integrated Reasoning: A New Approach for Math-Savvy LLMs
    When trying to get language models to solve complex math problems, researchers kept running into limits. Models like GPT-3 and ChatGPT still struggle with advanced algebra, calculus, and geometry questions. The math is just too abstract and symbol-heavy for them. To break through this barrier, researchers from Tsinghua University and Microsoft taught models to combine natural language reasoning with calling external math tools. The key is their new "tool-integrated reasoning" format. Models generate a natural language plan first, then write code to invoke tools like SymPy to solve equations. They take the output results and continue verbal reasoning. By interleaving natural language and symbolic computations, they get the best of both worlds - semantic understanding from language models and rigorous math from tools. They trained versions of the LLaMA model this way, producing their Tool-Integrated Reasoning Agent (TORA). They present some strong results: In evaluations on 10 math datasets, TORA substantially outperformed prior state-of-the-art methods, achieving 13-19% higher accuracy on average. On one competition test, TORA-7B scored 40% accuracy, beating the previous best model by 22 percentage points. This demonstrates that integrating tools directly into the reasoning process can significantly enhance mathematical capabilities, even for large models like GPT-4. However, tough problems involving geometry and advanced algebra are still there. New techniques for symbolic reasoning and spatial understanding will likely be needed to push further. Overall though, tool integration seems a promising path to improve reasoning skills. Applying this to other domains like logic and programming could also be impactful. TLDR: Teaching language models to use math tools helps them solve way more complex problems. Full Paper Summary arXiv Link submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [P] Awesome AI developer productivity Github repo
    Hello everyone, We've begun gathering a variety of AI coding tools used in one place to make things easier for everyone. We're inviting everyone to check out our collection, and maybe even add tools you find useful. You can find the repository here: https://github.com/gaborsoter/awesome-ai-dev-productivity Feel free to explore and contribute! submitted by /u/BootstrapGuy [link] [comments]  ( 9 min )
    [R] On the Biometric Capacity of Generative Face Models
    We developed a statistical model to estimate “How many unique identities can a generative face model generate?” without exhaustively generating a lot of faces. Abstract: There has been tremendous progress in generating realistic faces with high fidelity over the past few years. Despite this progress, a crucial question remains unanswered: “Given a generative face model, how many unique identities can it generate?” In other words, what is the biometric capacity of the generative face model? A scientific basis for answering this question will benefit evaluating and comparing different generative face models and establish an upper bound on their scalability. This paper proposes a statistical approach to estimate the biometric capacity of generated face images in a hyperspherical feature space. We employ our approach on multiple generative models, including unconditional generators like StyleGAN, Latent Diffusion Model, and “Generated Photos,” as well as DCFace, a class-conditional generator. We also estimate capacity w.r.t. demographic attributes such as gender and age. Our capacity estimates indicate that (a) under ArcFace representation at a false acceptance rate (FAR) of 0.1%, StyleGAN3 and DCFace have a capacity upper bound of 1.43 million and 11,900, respectively; (b) the capacity reduces drastically as we lower the desired FAR with an estimate of 17,960 and 562 at FAR of 1% and 10%, respectively, for StyleGAN3; (c) there is no discernible disparity in the capacity w.r.t gender; and (d) for some generative models, there is an appreciable disparity in the capacity w.r.t age. Paper: https://arxiv.org/abs/arXiv:2308.02065 Code: https://github.com/human-analysis/capacity-generative-face-models submitted by /u/VishDev [link] [comments]  ( 9 min )
    [P] Comgra: A library for debugging and understanding neural networks
    I'm a machine learning engineer and researcher. I got fed up with how difficult it is to understand why neural networks behave the way they do, so i wrote a library to help with it. Comgra (computation graph analysis) is a library you can use with pytorch to extract all the tensor data you care about and visualize it graphically in a browser. This allows for a much more detailed analysis of what is happening than the usual approach of using tensorboard. You can go investigate tensors as training proceeds, drill down into individual neurons, inspect single data sets that are of special interest to you, track gradients, compare statistics between different training runs, and more. This tool has saved me a ton of time in my research by letting me check my hypotheses much more quickly than normal and by helping me understand how the different parts of my network really interact. I first published this a month ago and have made some improvements since then. I would be happy to hear even more feedback! My goal is to make this the go-to library used both by novices who want to understand what's going on under the hood, and by researchers in neural architecture design. submitted by /u/Smart-Emu5581 [link] [comments]  ( 9 min )
    [D] The most complete Audio ML toolkit 🚀
    Hugging Face Transformers is a complete audio toolkit that provides state-of-the-art models for all audio tasks, including TTS, ASR, audio embeddings, audio classification and music generation. All you need to do is install the Transformers package: pip install --upgrade transformers And then all of these models can be used in just 3 lines of code: ​ TTS Example usage: from transformers import pipeline generator = pipeline("text-to-speech", model="suno/bark-small") speech = generator("Hey - it's Hugging Face on the phone!") Available models: Bark https://huggingface.co/suno/bark MMS TTS https://huggingface.co/facebook/mms-tts-eng VITS https://huggingface.co/kakao-enterprise/vits-vctk SpeechT5 https://huggingface.co/microsoft/speecht5_tts And more! https://huggingface.co/mo…  ( 9 min )
    [R] The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) - Microsoft 2023 - 166 Pages!
    Paper: https://arxiv.org/abs/2309.17421 Youtube: https://youtu.be/Q0pP782dSh0?si=MiJAlK5k-KEyQ-Zr Abstract: Large multimodal models (LMMs) extend large language models (LLMs) with multi-sensory skills, such as visual understanding, to achieve stronger generic intelligence. In this paper, we analyze the latest model, GPT-4V(ision), to deepen the understanding of LMMs. The analysis focuses on the intriguing tasks that GPT-4V can perform, containing test samples to probe the quality and genericity of GPT-4V's capabilities, its supported inputs and working modes, and the effective ways to prompt the model. In our approach to exploring GPT-4V, we curate and organize a collection of carefully designed qualitative samples spanning a variety of domains and tasks. Observations from these samples demonstrate that GPT-4V's unprecedented ability in processing arbitrarily interleaved multimodal inputs and the genericity of its capabilities together make GPT-4V a powerful multimodal generalist system. Furthermore, GPT-4V's unique capability of understanding visual markers drawn on input images can give rise to new human-computer interaction methods such as visual referring prompting. We conclude the report with in-depth discussions on the emerging application scenarios and the future research directions for GPT-4V-based systems. We hope that this preliminary exploration will inspire future research on the next-generation multimodal task formulation, new ways to exploit and enhance LMMs to solve real-world problems, and gaining better understanding of multimodal foundation models. https://preview.redd.it/qkytzg2rjqrb1.jpg?width=511&format=pjpg&auto=webp&s=fc306dc6ae64100e993639f8e27583b809bf8a5c https://preview.redd.it/z4kq0l2rjqrb1.jpg?width=507&format=pjpg&auto=webp&s=d4fda59456846fa7a6c9b318b21fc9c544bd2b68 https://preview.redd.it/1ptrkk2rjqrb1.jpg?width=712&format=pjpg&auto=webp&s=2b44fbc949e76fdf20d05b1236f56c87ba5efece ​ ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [P] NanoPhi, Implementing some of the success of Phi-1.5, with GPT-2(124m)
    Hi, i'm trying to replicate at least some of the success of Phi 1.5 on a model 10x smaller, gpt-2 124m. I have started with model finetuning, and have a simple github with roadmap, https://github.com/VatsaDev/NanoPhi, check it out there! submitted by /u/vatsadev [link] [comments]  ( 9 min )
  • Open

    Code Llama code generation models from Meta are now available via Amazon SageMaker JumpStart
    Today, we are excited to announce Code Llama foundation models, developed by Meta, are available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. Code Llama is a state-of-the-art large language model (LLM) capable of generating code and natural language about code from both code and natural language prompts. Code […]  ( 11 min )
    Build an end-to-end MLOps pipeline for visual quality inspection at the edge – Part 1
    A successful deployment of a machine learning (ML) model in a production environment heavily relies on an end-to-end ML pipeline. Although developing such a pipeline can be challenging, it becomes even more complex when dealing with an edge ML use case. Machine learning at the edge is a concept that brings the capability of running […]  ( 10 min )
    Build an end-to-end MLOps pipeline for visual quality inspection at the edge – Part 2
    In Part 1 of this series, we drafted an architecture for an end-to-end MLOps pipeline for a visual quality inspection use case at the edge. It is architected to automate the entire machine learning (ML) process, from data labeling to model training and deployment at the edge. The focus on managed and serverless services reduces […]  ( 9 min )
    Build an end-to-end MLOps pipeline for visual quality inspection at the edge – Part 3
    This is Part 3 of our series where we design and implement an MLOps pipeline for visual quality inspection at the edge. In this post, we focus on how to automate the edge deployment part of the end-to-end MLOps pipeline. We show you how to use AWS IoT Greengrass to manage model inference at the […]  ( 9 min )
  • Open

    [D] RL agenda after LLMs or S4?
    Many other students in my research institution are pretty worried after ChatGPT / LLMs about continuing work in RL and are thinking of leaving the field. What are main the open problems in RL after LLMs and S4 can solve a hefty chunk of sequence learning problems? submitted by /u/Cultural-Average3959 [link] [comments]  ( 9 min )
    RLHF without GAE
    If I already have a trained reward model, say a sentiment classification model, that I'd like to use for PPO-based RLHF, I believe the standard method would be to instantiate the Critic/value function using the reward model, and train it further during PPO, correct? Would it even make sense to try PPO for RLHF without using the GAE term and thus without the value function, and just directly using the reward model's output as the advantage? It seems that this would be require viewing the entire generation as a single action (rather than each token's generation as an action), but most of the articles I've read on RLHF seem to treat it that way. On the other hand, all the code implementations I've seen have an Actor-Critic model producing values at each token, which I think implies that each token is an action. Edit: Apologies if any of this is just me having fundamental gaps in my understanding! submitted by /u/ganzzahl [link] [comments]  ( 9 min )
    3-player graph pursuit game
    So I am trying to find NE using rl algorithms for a turn based deterministic graph pursuit game. I have a way of checking if the strategies of players 1,2,3 are a NE and thought of using Q-Learning and see if it converges to a NE. Thus far it doesnt seem to work and I wonder if I made a mistake. The state is described as: St = [x1 x2 x3 p] where current player is p and x1,x2,x3 are the locations of the players in the graph Players have value functions Q^1(St), Q^2(St), Q^3(St) The way I update my value function is: player i choose e-greedy action a_t and the new state St_new Q^i(St) = (1-alpha)*Q^i(St)+alpha*gamma*Q(St_new) I have tried using a memory buffer but I havent improve the convergence success. I check if the if the values are a NE every 1000 iterations. It only converges for simple graphs. Do you think the way I update my value function is correct? Do you have any other traditional algorithms to suggest? Shall I move to deep learning? I am worried if simple algorithms cant converge the neural networks wont either... I tried to implemenet Nash Q learning following the paper:https://www.jmlr.org/papers/volume4/hu03a/hu03a.pdf but I am not sure if implemented correctly for a turn based game submitted by /u/__gp_ [link] [comments]  ( 9 min )
  • Open

    Save 20 Hours A Week With This 1 Simple ChatGPT Prompt for Productivity
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    AI Anxiety’ Is on the Rise–Here’s How to Manage It
    Artificial intelligence (AI) anxiety is on the rise, but there are ways to manage it. While AI may outperform humans in certain tasks, humans are not yet headed for all-out replacement. Recent research shows that AI programs scored higher than humans in tasks requiring originality, but the highest-rated human ideas were still considered more creative. The rise of generative AI tools in industries like animation has left some professionals anxious about the future of their work. Experts suggest managing AI fears by understanding the historical context of technological advancements and focusing on the benefits and training opportunities that AI brings. Source : https://www.scientificamerican.com/article/ai-anxiety-is-on-the-rise-heres-how-to-manage-it/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Tool-Integrated Reasoning: A New Approach for Math-Savvy LLMs
    When trying to get language models to solve complex math problems, researchers kept running into limits. Models like GPT-3 and ChatGPT still struggle with advanced algebra, calculus, and geometry questions. The math is just too abstract and symbol-heavy for them. To break through this barrier, researchers from Tsinghua University and Microsoft taught models to combine natural language reasoning with calling external math tools. The key is their new "tool-integrated reasoning" format. Models generate a natural language plan first, then write code to invoke tools like SymPy to solve equations. They take the output results and continue verbal reasoning. By interleaving natural language and symbolic computations, they get the best of both worlds - semantic understanding from language models and rigorous math from tools. They trained versions of the LLaMA model this way, producing their Tool-Integrated Reasoning Agent (TORA). They present some strong results: In evaluations on 10 math datasets, TORA substantially outperformed prior state-of-the-art methods, achieving 13-19% higher accuracy on average. On one competition test, TORA-7B scored 40% accuracy, beating the previous best model by 22 percentage points. This demonstrates that integrating tools directly into the reasoning process can significantly enhance mathematical capabilities, even for large models like GPT-4. However, tough problems involving geometry and advanced algebra are still there. New techniques for symbolic reasoning and spatial understanding will likely be needed to push further. Overall though, tool integration seems a promising path to improve reasoning skills. Applying this to other domains like logic and programming could also be impactful. TLDR: Teaching language models to use math tools helps them solve way more complex problems. Full Paper Summary arXiv Link submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    Bard, what would it take to fast-track AI morality so that it surpasses human ability by 2025?
    To fast-track AI morality to surpass human ability by 2025, Google, OpenAI, Anthropic, and other developers could: Invest heavily in AI ethics research and development.** For example, Google AI has launched the Responsible AI Research and Development (RAIR) program, which is focused on developing new AI ethics research and tools. Create large-scale datasets of human moral reasoning.** For example, OpenAI has released the Moral Machine dataset, which contains over 1 million crowdsourced responses to moral dilemmas. Develop new AI architectures that are specifically designed for moral reasoning.** For example, Anthropic is developing a new AI architecture called "Moral Machine Learning," which is designed to learn from human moral reasoning data. Establish a global AI ethics research consortium and create a public database of human moral reasoning datasets.** For example, the Partnership on AI has launched the Safeguarding AI initiative, which is working to develop new safety mechanisms for AI systems. Fund research into developing new AI architectures for moral reasoning and develop new AI evaluation metrics for moral performance.** For example, the Moral Machine project is developing new evaluation metrics for AI systems' moral performance. By working together, Google, OpenAI, Anthropic, and other developers can help to fast-track AI morality and create AI systems that are more moral than humans. (199 words) submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    AI & Us Navigating the Digital Renaissance
    submitted by /u/Einsof__ [link] [comments]  ( 8 min )
    Prompt enginnering questions
    Is propt engineering a legit job ?? Is it here to stay ? Is it worth studying ? Best way to study it , land a job or freelancing ? submitted by /u/metasubcon [link] [comments]  ( 8 min )
    What app/program are they using on this Instagram?
    How does one make videos like on this Instagram page? It's unreal. https://instagram.com/nostalgicraindrops?igshid=MzRlODBiNWFlZA== submitted by /u/CK1886 [link] [comments]  ( 8 min )
    ChatGPT Can Now See? Mind-Blowing Ways People Can Use Image Recognition!
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    Let’s make a list of the BEST AI TOOLS for students
    Every day, new AI tools appear. There are also AI tools designed to make students' lives easier—from AI essay generators to study organizers. While there are many directories with AI tools, they are often not well-sorted for students. So, I've compiled a list of my favorite AI tools for educational purposes. AI tool How to use for studies Bing Chat - Writing excel formulas - Making graphs and charts - Answers for homework assignments - Researching for a paper Textero.ai - Search for relevant academic sources for essays - Research assistance with the "Ask AI" feature - Essay generation and paper formatting - Structured essay outline creation - Summarizing of texts ChatPDF - Interacting with academic PDFs - Asking specific questions about the content - Quickly locating essential data for assignments Socratic - Breaking down complex homework questions - Providing step-by-step educational guidance - Safe and interactive learning Writely AI - Improving grammar and writing clarity - Creating concise study notes - Feedback for content quality Turnitin - Checking for copied content - Comparing against a vast academic database - Highlighting potential plagiarism Got any to add to the list? Let's share and help each other! submitted by /u/loyallyUrticate [link] [comments]  ( 9 min )
    Tested Dalle, created a monster.
    submitted by /u/Grindmaster_Flash [link] [comments]  ( 8 min )
    Meta's Llama 2 Long outperforms GPT 3.5 and Claude 2
    Meta Platforms recently introduced Llama 2 Long, a revolutionary AI model that outperforms top competitors with its ability to generate accurate responses to long user queries. For the latest advancements in AI, look here first. https://preview.redd.it/geqqd3k5rprb1.png?width=1920&format=png&auto=webp&s=e72a67fc7ef7e85902169f3061529c136beadc87 Meta's new AI model As an enhancement of the original Llama 2, Llama 2 Long deals with larger data containing longer texts and is modified to handle lengthier information sequences. Its stellar performance outshines other models such as OpenAI's GPT-3.5 Turbo and Claude 2. How Llama 2 Long works Meta built different versions of Llama 2, ranging from 7 billion to 70 billion parameters, which refines its learning from data. Llama 2 Long employs Rotary Positional Embedding (RoPE) technique, refining the way it encodes the position of each token, allowing fewer data and memory to produce precise responses. The model further fine-tunes its performance using reinforcement learning from human feedback (RLHF), and synthetic data generated by Llama 2 chat itself. Impressive feats and future aspirations Llama 2 Long can create high-quality responses to user prompts up to 200,000 characters long, which is approximately 40 pages of text. Its ability to generate responses to queries on diverse topics such as history, science, literature, and sports indicates its potential to cater to complex and various user needs. The researchers see Llama 2 Long as a step towards broader, more adaptable AI models, and advocate for more research and dialogue to harness these models responsibly and beneficially. (source) P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and developments in AI. Professionals from Meta, Google, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    AI Image Generator That Is Good At Referencing Pop Culture
    I've recently tried Canva and Dall-E to generate an image that references two popular games, Dark Souls 3 and Baldur's Gate 3. And they both fall on their face. Maybe my prompt is bad but Canva is not getting me what I want. Dall-E ran out of free credits. Do you guys have any recommendations. Midjourney is no longer free now. I would like this to be free and has good references to popular culture. submitted by /u/livingroomsessions [link] [comments]  ( 9 min )
  • Open

    Awarded DAGM MVTec Dissertation Award 2023
    In September, I received the DAGM MVTec dissertation award 2023 for my PhD thesis. DAGM is the German association for pattern recognition and organizes the German Conference on Pattern Recognition (GCPR) which is Germany's prime conference for computer vision and related research areas. I feel particularly honored by this award since my academic career started with my first paper published as part of the young researcher forum at GCPR 2015 in Aachen. The post Awarded DAGM MVTec Dissertation Award 2023 appeared first on David Stutz.  ( 3 min )
  • Open

    Supereggs, squigonometry, and squircles
    The Depths of Wikipedia twitter account posted a screenshot about supereggs that’s popular at the moment. It says there’s no way this is real. they must be making these words up above a screenshot from the Wikipedia article on supereggs saying The definition can be changed to have an equality rather than an inequality; this […] Supereggs, squigonometry, and squircles first appeared on John D. Cook.  ( 5 min )
    Corny AI
    Meredith Whittaker posted on Twitter that In addition to being the best in privacy, Signal is also the best in not subjecting you to corny ‘AI’ features no one asked for or wants. I love the phrase “corny AI.” That’s exactly what a lot of AI features are. “Would you like help composing that tweet?” […] Corny AI first appeared on John D. Cook.  ( 5 min )
    Today’s star
    The star-like image above is today’s exponential sum. The exponential sum page on my site generates a new image each day by putting the numbers of the day’s month, day, and year into the equation and connecting the partial sums in the complex plane. Here m is the month, d is the day, and y […] Today’s star first appeared on John D. Cook.  ( 5 min )
  • Open

    A more effective experimental design for engineering a cell into a new state
    By focusing on causal relationships in genome regulation, a new AI method could help scientists identify new immunotherapy techniques or regenerative therapies.  ( 10 min )
    Is AI in the eye of the beholder?
    Study shows users can be primed to believe certain things about an AI chatbot’s motives, which influences their interactions with the chatbot.  ( 10 min )
  • Open

    Fishing with Neural Nets | Transforming Ecology with Artificial Intelligence
    submitted by /u/plutoandmal [link] [comments]  ( 8 min )
  • Open

    Google at ICCV 2023
    Posted by Shaina Mehta, Program Manager, Google Google is proud to be a Platinum Sponsor of the International Conference on Computer Vision (ICCV 2023), a premier annual conference, which is being held this week in Paris, France. As a leader in computer vision research, Google has a strong presence at this year’s conference with 60 accepted papers and active involvement in 27 workshops and tutorials. Google is also proud to be a Platinum Sponsor for the LatinX in CV workshop. We look forward to sharing some of our extensive computer vision research and expanding our partnership with the broader research community. Attending ICCV 2023? We hope you’ll visit the Google booth to chat with researchers who are actively pursuing the latest innovations in computer vision, and check out som…  ( 94 min )

  • Open

    CGPT-4, how would an AI select the 100 most intelligent people on the planet based on their content rather than on their positions and awards?
    If an AI were to take on this task, it would primarily rely on data mining and natural language processing (NLP) techniques. First, the AI would scrape digital platforms to collect works authored by various people. Platforms could range from academic databases like PubMed or ArXiv to blogs, forums, and social media. The goal would be to capture a wide array of intellectual output, irrespective of the person's titles or accolades. Next, an NLP model would evaluate the collected content based on factors like originality, complexity, and coherence. Advanced sentiment analysis could be employed to gauge the depth of understanding and nuances in the arguments made. Specialized algorithms could also be developed to assess the impact of each piece of work, using metrics such as citations, social shares, or subsequent works that build upon it. The AI would then create a shortlist based on these evaluations. This stage might also involve unsupervised machine learning techniques like clustering to find patterns or commonalities among the top contenders. The final stage would be validation, possibly using reinforcement learning. The AI could simulate various scenarios or problems and predict how the content created by these individuals would contribute to solving them. It would then refine its list based on the simulated outcomes. This all-AI approach would drastically reduce human bias and could be executed relatively quickly. However, it's important to note that any such system would need to be designed carefully to avoid introducing biases present in the training data or algorithms. submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    So it's unethical to kill an AI robot
    submitted by /u/bharath_brt [link] [comments]  ( 9 min )
    How Big Tech is co-opting the rising stars of artificial intelligence
    Big Tech's dominance in the artificial intelligence (AI) industry is growing as start-ups like Anthropic rely on their computing power and resources. Despite creating breakthrough AI technology, these start-ups still need the support of Big Tech to scale and succeed. The training of AI systems is expensive and requires specialized computer chips and data centers, which are mostly controlled by Amazon, Google, and Microsoft. Regulators, including the Federal Trade Commission and French competition authorities, are monitoring the industry for signs of anticompetitive behavior. Some business leaders believe that competition and efficiency will eventually drive down the cost of running AI models. Source : https://www.washingtonpost.com/technology/2023/09/30/anthropic-amazon-artificial-intelligence/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Data strategy >> Generative AI strategy
    A strong data strategy is crucial for the success of any AI strategy. Generative AI use cases depend on a healthy data infrastructure, including data governance, observability, catalog, data sharing, and lineage. Many enterprises lack the necessary data infrastructure to deploy customer-facing AI apps confidently. Poor data strategy and infrastructure can derail generative AI efforts. Existing issues with data ecosystems, such as data silos and poor data governance, will have a greater impact on generative AI workloads than new issues. Data silos, poor data discoverability, and the lack of data interoperability can become serious bottlenecks for generative AI apps. Source : https://nextword.substack.com/p/data-strategy-matters-for-generative submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Does anyone know a good AI tool to generate tattoo ideas and song cover art?
    Same as title submitted by /u/No-Educator-59 [link] [comments]  ( 9 min )
    Meta, INRIA researchers discover that explicit registers eliminate ViT attention spikes
    When visualizing the inner workings of vision transformers (ViTs), researchers noticed weird spikes of attention on random background patches. This didn't make sense since the models should focus on foreground objects. By analyzing the output embeddings, they found a small number of tokens (2%) had super high vector norms, causing the spikes. The high-norm "outlier" tokens occurred in redundant areas and held less local info but more global info about the image. Their hypothesis is that ViTs learn to identify unimportant patches and recycle them as temporary storage instead of discarding. This enables efficient processing but causes issues. Their fix is simple - just add dedicated "register" tokens that provide storage space, avoiding the recycling side effects. Models trained with registers have: Smoother and more meaningful attention maps Small boosts in downstream performance Way better object discovery abilities The registers give ViTs a place to do their temporary computations without messing stuff up. Just a tiny architecture tweak improves interpretability and performance. Sweet! I think it's cool how they reverse-engineered this model artifact and fixed it with such a small change. More work like this will keep incrementally improving ViTs. TLDR: Vision transformers recycle useless patches to store data, causing problems. Adding dedicated register tokens for storage fixes it nicely. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
  • Open

    [R] The unsolved mystery at the heard of the "How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions" paper
    submitted by /u/CellWithoutCulture [link] [comments]  ( 9 min )
    [D] How many instructions can LLMs handle before they start to ignore them?
    Prompt engineering frequently involves trying to encode very specific behaviors into a model to steer it a certain direction. In practice, as requirements become more complex, you often end up with fairly lengthy prompts, especially when using methods like RAG. I was wondering, how effective are LLMs at following instructions as the system prompt grows in size and complexity? I did some quick experiments on this and found that, unsurprisingly, GPT-4 can follow a lot of rules (up to 50) quite accurately. But even GPT-3.5 slowly degrades and Llama-2-70b-chat starts to fail after just a few rules. Comparison of performance metrics over increasing rule counts, demonstrating GPT-4's consistent performance and a decline in accuracy for GPT-3.5 and Llama-2-70b-chat. These results are based on …  ( 10 min )
    [R] LangDiversity: software to identify LLM errors
    Due to challenges such as hallucination, detecting errors in the output of a given prompt becomes an important challenge. LangDiversity is an implementation of "diversity measures" that are domain independent and can be used to measure the uncertainty in the result of a language model. Type pip install langdiversity Video: https://www.youtube.com/watch?v=86J_K9mR7lw Web: https://neurosymbolic.asu.edu/llm-correction/ Visit https://github.com/lab-v2/langdiversity Read the paper: https://arxiv.org/abs/2308.11189 https://preview.redd.it/rb0xg1ly8nrb1.png?width=1021&format=png&auto=webp&s=8e57056d24327ca2987abea12a7a9066a825738b submitted by /u/Neurosymbolic [link] [comments]  ( 9 min )
    [P] Simplest model to run with limited hardware
    We want to run (not train, i.e. think single forward pass only) an ML algorithm on a machine with very limited resources. Which model could we use to show off the possibilities? If the benchmark is an MLP for binary image classification, what else could we do with a similar scale of operations? E.g. Which model is the simplest for e.g. text-to-image generation? Any other ML models that are simple enough to run and if initialized with good params, does something impressive submitted by /u/2i2i_tokenized_time [link] [comments]  ( 9 min )
    [P] Deep Memory, a Way to Boost Retrieval Accuracy by up to +22% for RAG
    submitted by /u/davidbun [link] [comments]  ( 9 min )
    [D] Perplexity.ai Search Feasibility
    I've been using Perplexity.ai for a bit now when it hit me that I don't understand how they can sustain their business model with search. Stuff like Bing search and Google search cost around $5 or more per 1000 searches, so how can they even afford to do this kind of search. Do they have their own search index. Also, I don't know how they pull in the data from these sources so fast? I've played around with some things like this with Langchain with retrieval, but the speed of splitting and tokenizing website html is not very fast. Have they already pre-scrapped the websites from the search results and tokenized them for LLM retrieval? submitted by /u/dragon18456 [link] [comments]  ( 9 min )
    Metagpt use case [D]
    Guys, i am currently working building a project, there are certain tasks like building a ml model using certain use-cases. I wish to automate this task, do u think metagpt is a good fit for the same. Let me know if you need any further information!! EDIT: One of the tasks my app needs to do is to convert image to text (aim to implement image captioning). So, if i give metaGPT the requirements for my project, is it possible it will give me the code which I need. I need to save certain tasks here so that I can focus more on operation and design side. Edit: it seems, such kind of vague questions are not encouraged on this platform, I will work and will straigh away ask questions which are quite good and meet the standards of this platform. Thanks!! Thanks!! Always have a massive respect for this community!! submitted by /u/aristotleTheFake [link] [comments]  ( 9 min )
    [R] Meta, INRIA researchers discover that explicit registers eliminate ViT attention spikes
    When visualizing the inner workings of vision transformers (ViTs), researchers noticed weird spikes of attention on random background patches. This didn't make sense since the models should focus on foreground objects. By analyzing the output embeddings, they found a small number of tokens (2%) had super high vector norms, causing the spikes. The high-norm "outlier" tokens occurred in redundant areas and held less local info but more global info about the image. Their hypothesis is that ViTs learn to identify unimportant patches and recycle them as temporary storage instead of discarding. This enables efficient processing but causes issues. Their fix is simple - just add dedicated "register" tokens that provide storage space, avoiding the recycling side effects. Models trained with registers have: Smoother and more meaningful attention maps Small boosts in downstream performance Way better object discovery abilities The registers give ViTs a place to do their temporary computations without messing stuff up. Just a tiny architecture tweak improves interpretability and performance. Sweet! I think it's cool how they reverse-engineered this model artifact and fixed it with such a small change. More work like this will keep incrementally improving ViTs. TLDR: Vision transformers recycle useless patches to store data, causing problems. Adding dedicated register tokens for storage fixes it nicely. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Multiple single class segmentation vs single multiclass segmentation models
    submitted by /u/waterstrider123 [link] [comments]  ( 9 min )
    [R] SOTA of Deep-Shallow Encoder-Decoder LLMs for fast inference
    There's some evidence [1] [2] that it's possible to run text2text language model at substantially (potentially on the order of magnitude) better inference speed by keeping the decoder shallow. I'm curious whether some general reasoner SOTA (small model for machine translation available at [3]) style models are publicly available for this sort of thing. If not, how would one go about training one? Would it be necessary to do it entirely from scratch (extremely costly)? Or would it be possible to take, say, Flan-UL2 (20B), chop off its decoder, and train a much smaller decoder on top of it with the UL2 encoder frozen (ie how one trains adapter layers). Assuming the decoder hyperparameters are kept small, would this be possible within reasonable compute budget? Would that even meaningfully converge with small amount of compute (assuming same training objective as is for UL2)? Would the strength (ie somewhat comparable to 10B if we cut 20B in half) transfer from the SOTA encoder, or would cutting off half of the model layers kneecap it too badly? [1] https://arxiv.org/pdf/2006.10369.pdf [2] https://aclanthology.org/2023.sustainlp-1.6.pdf [3] https://github.com/snoop2head/Deep-Encoder-Shallow-Decoder submitted by /u/upalse [link] [comments]  ( 9 min )
  • Open

    LangDiversity: software to identify LLM errors
    Due to challenges such as hallucination, detecting errors in the output of a given prompt becomes an important challenge. LangDiversity is an implementation of "diversity measures" that are domain independent and can be used to measure the uncertainty in the result of a language model. ​ Type pip install langdiversity Video: https://www.youtube.com/watch?v=86J_K9mR7lw Web: https://neurosymbolic.asu.edu/llm-correction/ Visit https://github.com/lab-v2/langdiversity Read the paper: https://arxiv.org/abs/2308.11189 https://preview.redd.it/o0v8p9g7tmrb1.png?width=1021&format=png&auto=webp&s=ff1ac672b61f96e4669663410769127066a0674d submitted by /u/Neurosymbolic [link] [comments]  ( 9 min )
  • Open

    Reinforcement Learning + Computer Vision listing papers
    Hello everyone! A while back, I stumbled upon an interesting paper that applied Reinforcement Learning to Object Localization. I got fascinated by how computer vision tasks could be transformed into a reinforcement learning problem, making it feel like a Markov decision process ! So, i've decided to create a repository to compile all the existing (published) papers that delve into Reinforcement Learning in Computer Vision : https://github.com/rayanramoul/RLCV-Papers If you have any papers in mind or recommendations to enhance the repository, please don't hesitate to share them. Your input would be greatly appreciated! Thank you! :) submitted by /u/raysamram [link] [comments]  ( 9 min )

  • Open

    Implementing Gradient Descent in PyTorch
    The gradient descent algorithm is one of the most popular techniques for training deep neural networks. It has many applications in fields such as computer vision, speech recognition, and natural language processing. While the idea of gradient descent has been around for decades, it’s only recently that it’s been applied to applications related to deep […] The post Implementing Gradient Descent in PyTorch appeared first on MachineLearningMastery.com.  ( 25 min )

  • Open

    Training a Linear Regression Model in PyTorch
    Linear regression is a simple yet powerful technique for predicting the values of variables based on other variables. It is often used for modeling relationships between two or more continuous variables, such as the relationship between income and age, or the relationship between weight and height. Likewise, linear regression can be used to predict continuous […] The post Training a Linear Regression Model in PyTorch appeared first on MachineLearningMastery.com.  ( 24 min )
    Making Linear Predictions in PyTorch
    Linear regression is a statistical technique for estimating the relationship between two variables. A simple example of linear regression is to predict the height of someone based on the square root of the person’s weight (that’s what BMI is based on). To do this, we need to find the slope and intercept of the line. […] The post Making Linear Predictions in PyTorch appeared first on MachineLearningMastery.com.  ( 21 min )

  • Open

    Loading and Providing Datasets in PyTorch
    Structuring the data pipeline in a way that it can be effortlessly linked to your deep learning model is an important aspect of any deep learning-based system. PyTorch packs everything to do just that. While in the previous tutorial, we used simple datasets, we’ll need to work with larger datasets in real world scenarios in […] The post Loading and Providing Datasets in PyTorch appeared first on MachineLearningMastery.com.  ( 20 min )

  • Open

    Using Dataset Classes in PyTorch
    In machine learning and deep learning problems, a lot of effort goes into preparing the data. Data is usually messy and needs to be preprocessed before it can be used for training a model. If the data is not prepared correctly, the model won’t be able to generalize well. Some of the common steps required […] The post Using Dataset Classes in PyTorch appeared first on MachineLearningMastery.com.  ( 21 min )

  • Open

    Calculating Derivatives in PyTorch
    Derivatives are one of the most fundamental concepts in calculus. They describe how changes in the variable inputs affect the function outputs. The objective of this article is to provide a high-level introduction to calculating derivatives in PyTorch for those who are new to the framework. PyTorch offers a convenient way to calculate derivatives for […] The post Calculating Derivatives in PyTorch appeared first on Machine Learning Mastery.  ( 20 min )

  • Open

    Two-Dimensional Tensors in Pytorch
    Two-dimensional tensors are analogous to two-dimensional metrics. Like a two-dimensional metric, a two-dimensional tensor also has $n$ number of rows and columns. Let’s take a gray-scale image as an example, which is a two-dimensional matrix of numeric values, commonly known as pixels. Ranging from ‘0’ to ‘255’, each number represents a pixel intensity value. Here, […] The post Two-Dimensional Tensors in Pytorch appeared first on Machine Learning Mastery.  ( 21 min )

  • Open

    One-Dimensional Tensors in Pytorch
    PyTorch is an open-source deep learning framework based on Python language. It allows you to build, train, and deploy deep learning models, offering a lot of versatility and efficiency. PyTorch is primarily focused on tensor operations while a tensor can be a number, matrix, or a multi-dimensional array. In this tutorial, we will perform some […] The post One-Dimensional Tensors in Pytorch appeared first on Machine Learning Mastery.  ( 22 min )

  • Open

    365 Data Science courses free until November 21
    Sponsored Post   The unlimited access initiative presents a risk-free way to break into data science.     The online educational platform 365 Data Science launches the #21DaysFREE campaign and provides 100% free unlimited access to all content for three weeks. From November 1 to 21, you can take courses from renowned instructors and earn […] The post 365 Data Science courses free until November 21 appeared first on Machine Learning Mastery.  ( 15 min )

  • Open

    Attend the Data Science Symposium 2022, November 8 in Cincinnati
    Sponsored Post      Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […] The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.  ( 10 min )

  • Open

    My family's unlikely homeschooling journey
    My husband Jeremy and I never intended to homeschool, and yet we have now, unexpectedly, committed to homeschooling long-term. Prior to the pandemic, we both worked full-time in careers that we loved and found meaningful, and we sent our daughter to a full-day Montessori school. Although I struggled with significant health issues, I felt unbelievably lucky and fulfilled in both my family life and my professional life. The pandemic upended my careful balance. Every family is different, with different needs, circumstances, and constraints, and what works for one may not work for others. My intention here is primarily to share the journey of my own (very privileged) family. Our unplanned introduction to homeschooling For the first year of the pandemic, most schools in California, where …  ( 7 min )

  • Open

    The Jupyter+git problem is now solved
    Jupyter notebooks don’t work with git by default. With nbdev2, the Jupyter+git problem has been totally solved. It provides a set of hooks which provide clean git diffs, solve most git conflicts automatically, and ensure that any remaining conflicts can be resolved entirely within the standard Jupyter notebook environment. To get started, follow the directions on Git-friendly Jupyter. Contents The Jupyter+git problem The solution The nbdev2 git merge driver The nbdev2 Jupyter save hook Background The result Postscript: other Jupyter+git tools ReviewNB An alternative solution: Jupytext nbdime The Jupyter+git problem Jupyter notebooks are a powerful tool for scientists, engineers, technical writers, students, teachers, and more. They provide an ideal notebook environment for interact…  ( 7 min )
2023-10-31T00:43:44.567Z osmosfeed 1.15.1